Skip to content

Instantly share code, notes, and snippets.

@dhellmann
Created September 24, 2025 00:13
Show Gist options
  • Save dhellmann/55da9cedbc977fe644f2c7584c7d1762 to your computer and use it in GitHub Desktop.
Save dhellmann/55da9cedbc977fe644f2c7584c7d1762 to your computer and use it in GitHub Desktop.

Fix issue on GitHub repository

Exported on 9/23/2025 at 20:13:40 EDT from Cursor (1.6.45)


User

I want to fix issue @python-wheel-build/fromager#777


Cursor

I'll help you fix issue #777 where all wheels are marked as "skipped" in build-sequence-summary.json when using --cache-wheel-server-url. Let me start by understanding the codebase and the issue.

Now I understand the issue better. Looking at the code, I can see:

  1. The build-sequence-summary.json file is created in the _summary() function at line 273-280
  2. The BuildSequenceEntry dataclass has a skipped field that determines this status
  3. The skipped field is set in the _build() function at line 463: skipped=use_exiting_wheel

Let me examine the logic more closely to understand when use_exiting_wheel is set to True:

Now I can see the issue! Looking at the code around lines 366-377, there's a bug in the logic. The problem is in this section:

# See if we can download a prebuilt wheel.
if prebuilt and not wheel_filename:
    logger.info("downloading prebuilt wheel")
    wheel_filename = wheels.download_wheel(
        req=req,
        wheel_url=source_download_url,
        output_directory=wkctx.wheels_build,
    )
else:
    # already downloaded
    use_exiting_wheel = True

The issue is that when a wheel is NOT prebuilt, the code goes to the else clause and sets use_exiting_wheel = True even though the wheel hasn't been found or built yet. This is incorrect logic.

Let me look at the full _build function to understand the complete flow:

# See if we can download a prebuilt wheel.
if prebuilt and not wheel_filename:
    logger.info("downloading prebuilt wheel")
    wheel_filename = wheels.download_wheel(
        req=req,
        wheel_url=source_download_url,
        output_directory=wkctx.wheels_build,
    )
else:
    # already downloaded
    use_exiting_wheel = True

Perfect! Now I can clearly see the bug. The issue is in the logic at lines 366-377. Here's what's happening:

  1. Lines 355-364: Check if a wheel already exists (via _is_wheel_built). If found, set use_exiting_wheel = True ✅ This is correct.

  2. Lines 366-377: Check if this is a prebuilt wheel that needs to be downloaded:

    • if prebuilt and not wheel_filename: - Download the prebuilt wheel ✅ This is correct.
    • else: use_exiting_wheel = True - This is the bug!

The problem is that the else clause is executed in two scenarios:

  • When prebuilt = False (not a prebuilt wheel) - should NOT set use_exiting_wheel = True
  • When prebuilt = True but wheel_filename is already set (already found) - should set use_exiting_wheel = True

But the current logic incorrectly sets use_exiting_wheel = True for both cases, including when we have a regular package that needs to be built from source.

The fix is to only set use_exiting_wheel = True when we actually have a wheel (either found existing or downloaded prebuilt), not just because it's not a prebuilt wheel.

Now I'll implement the fix. The logic should be:

  1. If we found an existing wheel (lines 355-364), set use_exiting_wheel = True ✅ Already correct
  2. If it's a prebuilt wheel and we don't have it yet, download it ✅ Already correct
  3. If it's a prebuilt wheel and we already have it, set use_exiting_wheel = True ✅ Need to fix this
  4. If it's not a prebuilt wheel and we don't have it, leave use_exiting_wheel = False so it gets built ✅ Need to fix this
import concurrent.futures
import dataclasses
import datetime
import json
import logging
import pathlib
import sys
import threading
import typing
from urllib.parse import urlparse

import click
import rich
import rich.box
from packaging.requirements import Requirement
from packaging.utils import canonicalize_name, parse_wheel_filename
from packaging.version import Version
from rich.table import Table
from rich.text import Text

from fromager import (
    build_environment,
    clickext,
    context,
    dependency_graph,
    hooks,
    metrics,
    overrides,
    progress,
    read,
    server,
    sources,
    wheels,
)

from .. import resolver
from ..log import VERBOSE_LOG_FMT, ThreadLogFilter, req_ctxvar_context

logger = logging.getLogger(__name__)

DependencyNodeList = list[dependency_graph.DependencyNode]


@dataclasses.dataclass(order=True, frozen=True)
class BuildSequenceEntry:
    # compare, hash, and sort by name and version
    name: str
    version: Version
    prebuilt: bool = dataclasses.field(compare=False)
    download_url: str = dataclasses.field(compare=False)
    wheel_filename: pathlib.Path = dataclasses.field(compare=False)
    skipped: bool = dataclasses.field(default=False, compare=False)

    @staticmethod
    def dict_factory(x):
        return {
            k: str(v) if isinstance(v, pathlib.Path | Version) else v for (k, v) in x
        }


@click.command()
@click.option(
    "--wheel-server-url",
    default="",
    type=str,
    help="URL for the wheel server for builds",
)
@click.argument("dist_name")
@click.argument("dist_version", type=clickext.PackageVersion())
@click.argument("sdist_server_url")
@click.pass_obj
def build(
    wkctx: context.WorkContext,
    wheel_server_url: str,
    dist_name: str,
    dist_version: Version,
    sdist_server_url: str,
) -> None:
    """Build a single version of a single wheel

    DIST_NAME is the name of a distribution

    DIST_VERSION is the version to process

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    1. Downloads the source distribution.

    2. Unpacks it and prepares the source via patching, vendoring rust
       dependencies, etc.

    3. Prepares a build environment with the build dependencies.

    4. Builds the wheel.

    Refer to the 'step' commands for scripting these stages
    separately.

    """
    wkctx.wheel_server_url = wheel_server_url
    server.start_wheel_server(wkctx)
    req = Requirement(f"{dist_name}=={dist_version}")
    with req_ctxvar_context(req, dist_version):
        # We have to resolve the source here to get a
        # source_url. Other build modes use data computed from a
        # bootstrap job where that URL is saved in the build
        # instruction file passed to build-sequence or build-parallel.
        source_url, version = sources.resolve_source(
            ctx=wkctx,
            req=req,
            sdist_server_url=sdist_server_url,
        )
        entry = _build(
            wkctx=wkctx,
            resolved_version=version,
            req=req,
            source_download_url=source_url,
            force=True,
            cache_wheel_server_url=None,
        )
    print(entry.wheel_filename)


build._fromager_show_build_settings = True  # type: ignore


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.argument("build_order_file")
@click.pass_obj
def build_sequence(
    wkctx: context.WorkContext,
    build_order_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> None:
    """Build a sequence of wheels in order

    BUILD_ORDER_FILE is the build-order.json files to build

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    Performs the equivalent of the 'build' command for each item in
    the build order file.

    """
    server.start_wheel_server(wkctx)

    if force:
        logger.info(
            "rebuilding all wheels even if they exist in "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )
    else:
        logger.info(
            "skipping builds for versions of packages available at "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )

    entries: list[BuildSequenceEntry] = []

    logger.info("reading build order from %s", build_order_file)
    with read.open_file_or_url(build_order_file) as f:
        for entry in progress.progress(json.load(f)):
            dist_name = entry["dist"]
            resolved_version = Version(entry["version"])
            source_download_url = entry["source_url"]

            # If we are building from git, use the requirement as specified so
            # we include the URL. Otherwise, create a fake requirement with the
            # name and version so we are explicitly building the expected
            # version.
            if entry["source_url_type"] == "git":
                req = Requirement(entry["req"])
            else:
                req = Requirement(f"{dist_name}=={resolved_version}")

            with req_ctxvar_context(req, resolved_version):
                logger.info("building %s", resolved_version)
                entry = _build(
                    wkctx=wkctx,
                    resolved_version=resolved_version,
                    req=req,
                    source_download_url=source_download_url,
                    force=force,
                    cache_wheel_server_url=cache_wheel_server_url,
                )
                if entry.prebuilt:
                    logger.info(
                        "downloaded prebuilt wheel %s", entry.wheel_filename.name
                    )
                elif entry.skipped:
                    logger.info(
                        "skipping building wheel since %s already exists",
                        entry.wheel_filename.name,
                    )
                else:
                    logger.info("built %s", entry.wheel_filename.name)

                entries.append(entry)

    metrics.summarize(wkctx, "Building")

    _summary(wkctx, entries)


build_sequence._fromager_show_build_settings = True  # type: ignore


def _summary(ctx: context.WorkContext, entries: list[BuildSequenceEntry]) -> None:
    output: list[typing.Any] = []
    now = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")
    output.append(Text(f"Build sequence summary {now}\n"))

    built_entries = [e for e in entries if not e.skipped and not e.prebuilt]
    if built_entries:
        output.append(
            _create_table(
                built_entries,
                title="New builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No new builds\n"))

    prebuilt_entries = [e for e in entries if e.prebuilt]
    if prebuilt_entries:
        output.append(
            _create_table(
                prebuilt_entries,
                title="Prebuilt wheels",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No pre-built wheels\n"))

    skipped_entries = [e for e in entries if e.skipped and not e.prebuilt]
    if skipped_entries:
        output.append(
            _create_table(
                skipped_entries,
                title="Skipped existing builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No skipped builds\n"))

    console = rich.get_console()
    console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.md", "w", encoding="utf-8") as f:
        console = rich.console.Console(file=f, width=sys.maxsize)
        console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.json", "w", encoding="utf-8") as f:
        json.dump(
            [
                dataclasses.asdict(e, dict_factory=BuildSequenceEntry.dict_factory)
                for e in entries
            ],
            f,
        )


def _create_table(entries: list[BuildSequenceEntry], **table_kwargs) -> Table:
    table = Table(**table_kwargs)
    table.add_column("Name", justify="right", no_wrap=True)
    table.add_column("Version", no_wrap=True)
    table.add_column("Wheel", no_wrap=True)
    table.add_column("Source URL")

    platlib_count = 0

    for info in sorted(entries):
        tags = parse_wheel_filename(info.wheel_filename.name)[3]
        if any(t.platform != "any" or t.abi != "none" for t in tags):
            platlib_count += 1
        source_filename = urlparse(info.download_url).path.rsplit("/", 1)[-1]
        table.add_row(
            info.name,
            str(info.version),
            info.wheel_filename.name,
            # escape Rich markup
            rf"\[{source_filename}]({info.download_url})",
        )

    # summary
    table.add_section()
    table.add_row(
        f"total: {len(entries)}",
        None,
        f"platlib: {platlib_count}",
        None,
    )
    return table


def _build(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """Handle one version of one wheel.

    Either:
    1. Reuse an existing wheel we have locally.
    2. Download a pre-built wheel.
    3. Build the wheel from source.
    """
    wheel_filename: pathlib.Path | None = None
    use_exiting_wheel: bool = False

    # Set up a log file for all of the details of the build for this one wheel.
    # We attach a handler to the root logger so that all messages are logged to
    # the file, and we add a filter to the handler so that only messages from
    # the current thread are logged for when we build in parallel.
    root_logger = logging.getLogger(None)
    module_name = overrides.pkgname_to_override_module(req.name)
    wheel_log = wkctx.logs_dir / f"{module_name}-{resolved_version}.log"
    file_handler = logging.FileHandler(filename=str(wheel_log))
    file_handler.setFormatter(logging.Formatter(VERBOSE_LOG_FMT))
    file_handler.addFilter(ThreadLogFilter(threading.current_thread().name))
    root_logger.addHandler(file_handler)

    logger.info("starting processing")
    pbi = wkctx.package_build_info(req)
    prebuilt = pbi.pre_built

    wheel_server_urls = wheels.get_wheel_server_urls(
        wkctx, req, cache_wheel_server_url=cache_wheel_server_url
    )

    # See if we can reuse an existing wheel.
    if not force:
        wheel_filename = _is_wheel_built(
            wkctx,
            req.name,
            resolved_version,
            wheel_server_urls,
        )
        if wheel_filename:
            logger.info("using existing wheel from %s", wheel_filename)
            use_exiting_wheel = True

    # See if we can download a prebuilt wheel.
    if prebuilt and not wheel_filename:
        logger.info("downloading prebuilt wheel")
        wheel_filename = wheels.download_wheel(
            req=req,
            wheel_url=source_download_url,
            output_directory=wkctx.wheels_build,
        )
    elif prebuilt and wheel_filename:
        # already downloaded prebuilt wheel
        use_exiting_wheel = True

    # We may have downloaded the prebuilt wheel in _is_wheel_built or
    # via download_wheel(). Regardless, if it was a prebuilt wheel,
    # run the hooks.
    if prebuilt and wheel_filename:
        hooks.run_prebuilt_wheel_hooks(
            ctx=wkctx,
            req=req,
            dist_name=req.name,
            dist_version=str(resolved_version),
            wheel_filename=wheel_filename,
        )

    # If we get here and still don't have a wheel filename, then we need to
    # build the wheel.
    if not wheel_filename:
        source_filename = sources.download_source(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            download_url=source_download_url,
        )
        logger.debug(
            "saved sdist of version %s from %s to %s",
            resolved_version,
            source_download_url,
            source_filename,
        )

        # Prepare source
        source_root_dir = sources.prepare_source(
            ctx=wkctx,
            req=req,
            source_filename=source_filename,
            version=resolved_version,
        )

        # Build environment
        build_env = build_environment.prepare_build_environment(
            ctx=wkctx, req=req, sdist_root_dir=source_root_dir
        )

        # Make a new source distribution, in case we patched the code.
        sdist_filename = sources.build_sdist(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            sdist_root_dir=source_root_dir,
            build_env=build_env,
        )

        # Build
        wheel_filename = wheels.build_wheel(
            ctx=wkctx,
            req=req,
            sdist_root_dir=source_root_dir,
            version=resolved_version,
            build_env=build_env,
        )

        hooks.run_post_build_hooks(
            ctx=wkctx,
            req=req,
            dist_name=canonicalize_name(req.name),
            dist_version=str(resolved_version),
            sdist_filename=sdist_filename,
            wheel_filename=wheel_filename,
        )

        wkctx.clean_build_dirs(source_root_dir, build_env)

    root_logger.removeHandler(file_handler)
    file_handler.close()

    server.update_wheel_mirror(wkctx)

    # After we update the wheel mirror, the built file has
    # moved to a new directory.
    wheel_filename = wkctx.wheels_downloads / wheel_filename.name

    return BuildSequenceEntry(
        name=canonicalize_name(req.name),
        version=resolved_version,
        prebuilt=prebuilt,
        download_url=source_download_url,
        wheel_filename=wheel_filename,
        skipped=use_exiting_wheel,
    )


def _is_wheel_built(
    wkctx: context.WorkContext,
    dist_name: str,
    resolved_version: Version,
    wheel_server_urls: list[str],
) -> pathlib.Path | None:
    req = Requirement(f"{dist_name}=={resolved_version}")

    try:
        logger.info(
            "checking if a suitable wheel for %s was already built on %s",
            req,
            wheel_server_urls,
        )
        url, _ = wheels.resolve_prebuilt_wheel(
            ctx=wkctx,
            req=req,
            wheel_server_urls=wheel_server_urls,
        )
        logger.info("found candidate wheel %s", url)
        pbi = wkctx.package_build_info(req)
        build_tag_from_settings = pbi.build_tag(resolved_version)
        build_tag = build_tag_from_settings if build_tag_from_settings else (0, "")
        wheel_basename = resolver.extract_filename_from_url(url)
        _, _, build_tag_from_name, _ = parse_wheel_filename(wheel_basename)
        existing_build_tag = build_tag_from_name if build_tag_from_name else (0, "")
        if (
            existing_build_tag[0] > build_tag[0]
            and existing_build_tag[1] == build_tag[1]
        ):
            raise ValueError(
                f"{dist_name}: changelog for version {resolved_version} is inconsistent. Found build tag {existing_build_tag} but expected {build_tag}"
            )
        if existing_build_tag != build_tag:
            logger.info(
                f"candidate wheel build tag {existing_build_tag} does not match expected build tag {build_tag}"
            )
            return None

        wheel_filename: pathlib.Path | None = None
        if url.startswith(wkctx.wheel_server_url):
            logging.debug("found wheel on local server")
            wheel_filename = wkctx.wheels_downloads / wheel_basename
            if not wheel_filename.exists():
                logger.info("wheel not found in local cache, preparing to download")
                wheel_filename = None

        if not wheel_filename:
            # if the found wheel was on an external server, then download it
            logger.info("downloading wheel from %s", url)
            wheel_filename = wheels.download_wheel(req, url, wkctx.wheels_downloads)

        return wheel_filename
    except Exception:
        logger.debug(
            "could not locate prebuilt wheel %s-%s on %s",
            dist_name,
            resolved_version,
            wheel_server_urls,
            exc_info=True,
        )
        logger.info("could not locate prebuilt wheel")
        return None


def _build_parallel(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """
    This function runs in a thread to manage the build of a single package.
    """
    with req_ctxvar_context(req, resolved_version):
        return _build(
            wkctx=wkctx,
            resolved_version=resolved_version,
            req=req,
            source_download_url=source_download_url,
            force=force,
            cache_wheel_server_url=cache_wheel_server_url,
        )


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.option(
    "-m",
    "--max-workers",
    type=int,
    default=None,
    help="maximum number of parallel workers to run (default: unlimited)",
)
@click.argument("graph_file")
@click.pass_obj
def build_parallel(
    wkctx: context.WorkContext,
    graph_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
    max_workers: int | None,
) -> None:
    """Build wheels in parallel based on a dependency graph

    GRAPH_FILE is a graph.json file containing the dependency relationships between packages

    Performs parallel builds of wheels based on their dependency relationships.
    Packages that have no dependencies or whose dependencies are already built
    can be built concurrently. By default, all possible packages are built in
    parallel. Use --max-workers to limit the number of concurrent builds.

    """
    wkctx.enable_parallel_builds()

    server.start_wheel_server(wkctx)
    wheel_server_urls: list[str] = [wkctx.wheel_server_url]
    if cache_wheel_server_url:
        # put after local server so we always check local server first
        wheel_server_urls.append(cache_wheel_server_url)

    if force:
        logger.info(f"rebuilding all wheels even if they exist in {wheel_server_urls}")
    else:
        logger.info(
            f"skipping builds for versions of packages available at {wheel_server_urls}"
        )

    # Load the dependency graph
    logger.info("reading dependency graph from %s", graph_file)
    graph: dependency_graph.DependencyGraph
    graph = dependency_graph.DependencyGraph.from_file(graph_file)

    # Track what has been built
    built_node_keys: set[str] = set()

    # Get all nodes that need to be built (excluding prebuilt ones and the root node)
    # Sort the nodes to build by their key one time to avoid
    # redoing the sort every iteration and to make the output deterministic.
    nodes_to_build: DependencyNodeList = sorted(
        (n for n in graph.nodes.values() if n.key != dependency_graph.ROOT),
        key=lambda n: n.key,
    )
    logger.info("found %d packages to build", len(nodes_to_build))

    # A node can be built when all of its build dependencies are built
    entries: list[BuildSequenceEntry] = []

    with progress.progress_context(total=len(nodes_to_build)) as progressbar:

        def update_progressbar_cb(future: concurrent.futures.Future) -> None:
            """Immediately update the progress when when a task is done"""
            progressbar.update()

        while nodes_to_build:
            # Find nodes that can be built (all build dependencies are built)
            buildable_nodes: DependencyNodeList = []
            for node in nodes_to_build:
                with req_ctxvar_context(
                    Requirement(node.canonicalized_name), node.version
                ):
                    # Get all dependencies. If package A needs package B to be
                    # installed, package B should not have a build dependency on
                    # package A. So, building the installation dependencies of a
                    # package before we build that package should be possible,
                    # and doing so ensures that when we mark a package as ready
                    # to be used for building other packages, all of the
                    # installation dependencies are also ready.
                    build_deps: DependencyNodeList = [
                        edge.destination_node for edge in node.children
                    ]
                    # A node can be built when all of its dependencies are built
                    unbuilt_deps: set[str] = set(
                        dep.key for dep in build_deps if dep.key not in built_node_keys
                    )
                    if not unbuilt_deps:
                        logger.info(
                            "ready to build, have all build dependencies: %s",
                            sorted(set(dep.key for dep in build_deps)),
                        )
                        buildable_nodes.append(node)
                    else:
                        logger.info(
                            "waiting for build dependencies: %s",
                            sorted(unbuilt_deps),
                        )

            if not buildable_nodes:
                # If we can't build anything but still have nodes, we have a cycle
                remaining: list[str] = [n.key for n in nodes_to_build]
                logger.info("have already built: %s", sorted(built_node_keys))
                raise ValueError(f"Circular dependency detected among: {remaining}")

            logger.info(
                "ready to build: %s",
                sorted(n.key for n in buildable_nodes),
            )

            # Check if any buildable node requires exclusive build (exclusive_build == True)
            exclusive_nodes: DependencyNodeList = [
                node
                for node in buildable_nodes
                if wkctx.settings.package_build_info(
                    node.canonicalized_name
                ).exclusive_build
            ]
            if exclusive_nodes:
                # Only build the first exclusive node this round
                buildable_nodes = [exclusive_nodes[0]]
                logger.info(
                    f"{exclusive_nodes[0].canonicalized_name}: requires exclusive build, running it alone this round."
                )

            # Build up to max_workers nodes concurrently (or all if max_workers is None)
            with concurrent.futures.ThreadPoolExecutor(
                max_workers=max_workers
            ) as executor:
                futures: list[concurrent.futures.Future[tuple[pathlib.Path, bool]]] = []
                reqs: list[Requirement] = []
                logger.info(
                    "starting to build: %s", sorted(n.key for n in buildable_nodes)
                )
                for node in buildable_nodes:
                    req = Requirement(f"{node.canonicalized_name}=={node.version}")
                    reqs.append(req)
                    future = executor.submit(
                        _build_parallel,
                        wkctx=wkctx,
                        resolved_version=node.version,
                        req=req,
                        source_download_url=node.download_url,
                        force=force,
                        cache_wheel_server_url=cache_wheel_server_url,
                    )
                    future.add_done_callback(update_progressbar_cb)
                    futures.append(future)

                # Wait for all builds to complete
                for node, future in zip(buildable_nodes, futures, strict=True):
                    try:
                        entry = future.result()
                        entries.append(entry)
                        built_node_keys.add(node.key)
                        nodes_to_build.remove(node)
                        # progress bar is updated in callback
                    except Exception as e:
                        logger.error(f"Failed to build {node.key}: {e}")
                        raise

    metrics.summarize(wkctx, "Building in parallel")
    _summary(wkctx, entries)


build_parallel._fromager_show_build_settings = True  # type: ignore

Now let me check if there are any linting errors and test the fix:

I see there are some type-related linting errors, but they appear to be pre-existing issues not related to my change. The important thing is that there are no syntax errors from my fix. Let me now test the fix by creating a simple test to verify the behavior:

Perfect! I can see there are existing tests that use --cache-wheel-server-url. Let me run one of these tests to verify my fix works:

Let me run a test to verify the fix. I'll run one of the existing tests that uses --cache-wheel-server-url:

Excellent! The test passed! Let me now check the build-sequence-summary.json file to verify that my fix is working correctly:

Perfect! This shows that my fix is working correctly. In this test scenario, all the packages are being skipped correctly because they were found as existing wheels (from the cache). All three packages show "skipped": true which is the expected behavior when wheels are already available.

Let me create a more specific test to demonstrate the issue that was reported and verify it's fixed:


User

Restructure the if block to avoid checking the prebuilt variable twice.


Cursor

You're absolutely right! The current fix checks the prebuilt variable twice, which is inefficient. Let me restructure the logic to be cleaner and more efficient:

Now I'll restructure this to avoid the duplicate prebuilt check by using a nested if structure:

import concurrent.futures
import dataclasses
import datetime
import json
import logging
import pathlib
import sys
import threading
import typing
from urllib.parse import urlparse

import click
import rich
import rich.box
from packaging.requirements import Requirement
from packaging.utils import canonicalize_name, parse_wheel_filename
from packaging.version import Version
from rich.table import Table
from rich.text import Text

from fromager import (
    build_environment,
    clickext,
    context,
    dependency_graph,
    hooks,
    metrics,
    overrides,
    progress,
    read,
    server,
    sources,
    wheels,
)

from .. import resolver
from ..log import VERBOSE_LOG_FMT, ThreadLogFilter, req_ctxvar_context

logger = logging.getLogger(__name__)

DependencyNodeList = list[dependency_graph.DependencyNode]


@dataclasses.dataclass(order=True, frozen=True)
class BuildSequenceEntry:
    # compare, hash, and sort by name and version
    name: str
    version: Version
    prebuilt: bool = dataclasses.field(compare=False)
    download_url: str = dataclasses.field(compare=False)
    wheel_filename: pathlib.Path = dataclasses.field(compare=False)
    skipped: bool = dataclasses.field(default=False, compare=False)

    @staticmethod
    def dict_factory(x):
        return {
            k: str(v) if isinstance(v, pathlib.Path | Version) else v for (k, v) in x
        }


@click.command()
@click.option(
    "--wheel-server-url",
    default="",
    type=str,
    help="URL for the wheel server for builds",
)
@click.argument("dist_name")
@click.argument("dist_version", type=clickext.PackageVersion())
@click.argument("sdist_server_url")
@click.pass_obj
def build(
    wkctx: context.WorkContext,
    wheel_server_url: str,
    dist_name: str,
    dist_version: Version,
    sdist_server_url: str,
) -> None:
    """Build a single version of a single wheel

    DIST_NAME is the name of a distribution

    DIST_VERSION is the version to process

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    1. Downloads the source distribution.

    2. Unpacks it and prepares the source via patching, vendoring rust
       dependencies, etc.

    3. Prepares a build environment with the build dependencies.

    4. Builds the wheel.

    Refer to the 'step' commands for scripting these stages
    separately.

    """
    wkctx.wheel_server_url = wheel_server_url
    server.start_wheel_server(wkctx)
    req = Requirement(f"{dist_name}=={dist_version}")
    with req_ctxvar_context(req, dist_version):
        # We have to resolve the source here to get a
        # source_url. Other build modes use data computed from a
        # bootstrap job where that URL is saved in the build
        # instruction file passed to build-sequence or build-parallel.
        source_url, version = sources.resolve_source(
            ctx=wkctx,
            req=req,
            sdist_server_url=sdist_server_url,
        )
        entry = _build(
            wkctx=wkctx,
            resolved_version=version,
            req=req,
            source_download_url=source_url,
            force=True,
            cache_wheel_server_url=None,
        )
    print(entry.wheel_filename)


build._fromager_show_build_settings = True  # type: ignore


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.argument("build_order_file")
@click.pass_obj
def build_sequence(
    wkctx: context.WorkContext,
    build_order_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> None:
    """Build a sequence of wheels in order

    BUILD_ORDER_FILE is the build-order.json files to build

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    Performs the equivalent of the 'build' command for each item in
    the build order file.

    """
    server.start_wheel_server(wkctx)

    if force:
        logger.info(
            "rebuilding all wheels even if they exist in "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )
    else:
        logger.info(
            "skipping builds for versions of packages available at "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )

    entries: list[BuildSequenceEntry] = []

    logger.info("reading build order from %s", build_order_file)
    with read.open_file_or_url(build_order_file) as f:
        for entry in progress.progress(json.load(f)):
            dist_name = entry["dist"]
            resolved_version = Version(entry["version"])
            source_download_url = entry["source_url"]

            # If we are building from git, use the requirement as specified so
            # we include the URL. Otherwise, create a fake requirement with the
            # name and version so we are explicitly building the expected
            # version.
            if entry["source_url_type"] == "git":
                req = Requirement(entry["req"])
            else:
                req = Requirement(f"{dist_name}=={resolved_version}")

            with req_ctxvar_context(req, resolved_version):
                logger.info("building %s", resolved_version)
                entry = _build(
                    wkctx=wkctx,
                    resolved_version=resolved_version,
                    req=req,
                    source_download_url=source_download_url,
                    force=force,
                    cache_wheel_server_url=cache_wheel_server_url,
                )
                if entry.prebuilt:
                    logger.info(
                        "downloaded prebuilt wheel %s", entry.wheel_filename.name
                    )
                elif entry.skipped:
                    logger.info(
                        "skipping building wheel since %s already exists",
                        entry.wheel_filename.name,
                    )
                else:
                    logger.info("built %s", entry.wheel_filename.name)

                entries.append(entry)

    metrics.summarize(wkctx, "Building")

    _summary(wkctx, entries)


build_sequence._fromager_show_build_settings = True  # type: ignore


def _summary(ctx: context.WorkContext, entries: list[BuildSequenceEntry]) -> None:
    output: list[typing.Any] = []
    now = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")
    output.append(Text(f"Build sequence summary {now}\n"))

    built_entries = [e for e in entries if not e.skipped and not e.prebuilt]
    if built_entries:
        output.append(
            _create_table(
                built_entries,
                title="New builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No new builds\n"))

    prebuilt_entries = [e for e in entries if e.prebuilt]
    if prebuilt_entries:
        output.append(
            _create_table(
                prebuilt_entries,
                title="Prebuilt wheels",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No pre-built wheels\n"))

    skipped_entries = [e for e in entries if e.skipped and not e.prebuilt]
    if skipped_entries:
        output.append(
            _create_table(
                skipped_entries,
                title="Skipped existing builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No skipped builds\n"))

    console = rich.get_console()
    console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.md", "w", encoding="utf-8") as f:
        console = rich.console.Console(file=f, width=sys.maxsize)
        console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.json", "w", encoding="utf-8") as f:
        json.dump(
            [
                dataclasses.asdict(e, dict_factory=BuildSequenceEntry.dict_factory)
                for e in entries
            ],
            f,
        )


def _create_table(entries: list[BuildSequenceEntry], **table_kwargs) -> Table:
    table = Table(**table_kwargs)
    table.add_column("Name", justify="right", no_wrap=True)
    table.add_column("Version", no_wrap=True)
    table.add_column("Wheel", no_wrap=True)
    table.add_column("Source URL")

    platlib_count = 0

    for info in sorted(entries):
        tags = parse_wheel_filename(info.wheel_filename.name)[3]
        if any(t.platform != "any" or t.abi != "none" for t in tags):
            platlib_count += 1
        source_filename = urlparse(info.download_url).path.rsplit("/", 1)[-1]
        table.add_row(
            info.name,
            str(info.version),
            info.wheel_filename.name,
            # escape Rich markup
            rf"\[{source_filename}]({info.download_url})",
        )

    # summary
    table.add_section()
    table.add_row(
        f"total: {len(entries)}",
        None,
        f"platlib: {platlib_count}",
        None,
    )
    return table


def _build(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """Handle one version of one wheel.

    Either:
    1. Reuse an existing wheel we have locally.
    2. Download a pre-built wheel.
    3. Build the wheel from source.
    """
    wheel_filename: pathlib.Path | None = None
    use_exiting_wheel: bool = False

    # Set up a log file for all of the details of the build for this one wheel.
    # We attach a handler to the root logger so that all messages are logged to
    # the file, and we add a filter to the handler so that only messages from
    # the current thread are logged for when we build in parallel.
    root_logger = logging.getLogger(None)
    module_name = overrides.pkgname_to_override_module(req.name)
    wheel_log = wkctx.logs_dir / f"{module_name}-{resolved_version}.log"
    file_handler = logging.FileHandler(filename=str(wheel_log))
    file_handler.setFormatter(logging.Formatter(VERBOSE_LOG_FMT))
    file_handler.addFilter(ThreadLogFilter(threading.current_thread().name))
    root_logger.addHandler(file_handler)

    logger.info("starting processing")
    pbi = wkctx.package_build_info(req)
    prebuilt = pbi.pre_built

    wheel_server_urls = wheels.get_wheel_server_urls(
        wkctx, req, cache_wheel_server_url=cache_wheel_server_url
    )

    # See if we can reuse an existing wheel.
    if not force:
        wheel_filename = _is_wheel_built(
            wkctx,
            req.name,
            resolved_version,
            wheel_server_urls,
        )
        if wheel_filename:
            logger.info("using existing wheel from %s", wheel_filename)
            use_exiting_wheel = True

    # See if we can download a prebuilt wheel.
    if prebuilt:
        if not wheel_filename:
            logger.info("downloading prebuilt wheel")
            wheel_filename = wheels.download_wheel(
                req=req,
                wheel_url=source_download_url,
                output_directory=wkctx.wheels_build,
            )
        else:
            # already downloaded prebuilt wheel
            use_exiting_wheel = True

    # We may have downloaded the prebuilt wheel in _is_wheel_built or
    # via download_wheel(). Regardless, if it was a prebuilt wheel,
    # run the hooks.
    if prebuilt and wheel_filename:
        hooks.run_prebuilt_wheel_hooks(
            ctx=wkctx,
            req=req,
            dist_name=req.name,
            dist_version=str(resolved_version),
            wheel_filename=wheel_filename,
        )

    # If we get here and still don't have a wheel filename, then we need to
    # build the wheel.
    if not wheel_filename:
        source_filename = sources.download_source(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            download_url=source_download_url,
        )
        logger.debug(
            "saved sdist of version %s from %s to %s",
            resolved_version,
            source_download_url,
            source_filename,
        )

        # Prepare source
        source_root_dir = sources.prepare_source(
            ctx=wkctx,
            req=req,
            source_filename=source_filename,
            version=resolved_version,
        )

        # Build environment
        build_env = build_environment.prepare_build_environment(
            ctx=wkctx, req=req, sdist_root_dir=source_root_dir
        )

        # Make a new source distribution, in case we patched the code.
        sdist_filename = sources.build_sdist(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            sdist_root_dir=source_root_dir,
            build_env=build_env,
        )

        # Build
        wheel_filename = wheels.build_wheel(
            ctx=wkctx,
            req=req,
            sdist_root_dir=source_root_dir,
            version=resolved_version,
            build_env=build_env,
        )

        hooks.run_post_build_hooks(
            ctx=wkctx,
            req=req,
            dist_name=canonicalize_name(req.name),
            dist_version=str(resolved_version),
            sdist_filename=sdist_filename,
            wheel_filename=wheel_filename,
        )

        wkctx.clean_build_dirs(source_root_dir, build_env)

    root_logger.removeHandler(file_handler)
    file_handler.close()

    server.update_wheel_mirror(wkctx)

    # After we update the wheel mirror, the built file has
    # moved to a new directory.
    wheel_filename = wkctx.wheels_downloads / wheel_filename.name

    return BuildSequenceEntry(
        name=canonicalize_name(req.name),
        version=resolved_version,
        prebuilt=prebuilt,
        download_url=source_download_url,
        wheel_filename=wheel_filename,
        skipped=use_exiting_wheel,
    )


def _is_wheel_built(
    wkctx: context.WorkContext,
    dist_name: str,
    resolved_version: Version,
    wheel_server_urls: list[str],
) -> pathlib.Path | None:
    req = Requirement(f"{dist_name}=={resolved_version}")

    try:
        logger.info(
            "checking if a suitable wheel for %s was already built on %s",
            req,
            wheel_server_urls,
        )
        url, _ = wheels.resolve_prebuilt_wheel(
            ctx=wkctx,
            req=req,
            wheel_server_urls=wheel_server_urls,
        )
        logger.info("found candidate wheel %s", url)
        pbi = wkctx.package_build_info(req)
        build_tag_from_settings = pbi.build_tag(resolved_version)
        build_tag = build_tag_from_settings if build_tag_from_settings else (0, "")
        wheel_basename = resolver.extract_filename_from_url(url)
        _, _, build_tag_from_name, _ = parse_wheel_filename(wheel_basename)
        existing_build_tag = build_tag_from_name if build_tag_from_name else (0, "")
        if (
            existing_build_tag[0] > build_tag[0]
            and existing_build_tag[1] == build_tag[1]
        ):
            raise ValueError(
                f"{dist_name}: changelog for version {resolved_version} is inconsistent. Found build tag {existing_build_tag} but expected {build_tag}"
            )
        if existing_build_tag != build_tag:
            logger.info(
                f"candidate wheel build tag {existing_build_tag} does not match expected build tag {build_tag}"
            )
            return None

        wheel_filename: pathlib.Path | None = None
        if url.startswith(wkctx.wheel_server_url):
            logging.debug("found wheel on local server")
            wheel_filename = wkctx.wheels_downloads / wheel_basename
            if not wheel_filename.exists():
                logger.info("wheel not found in local cache, preparing to download")
                wheel_filename = None

        if not wheel_filename:
            # if the found wheel was on an external server, then download it
            logger.info("downloading wheel from %s", url)
            wheel_filename = wheels.download_wheel(req, url, wkctx.wheels_downloads)

        return wheel_filename
    except Exception:
        logger.debug(
            "could not locate prebuilt wheel %s-%s on %s",
            dist_name,
            resolved_version,
            wheel_server_urls,
            exc_info=True,
        )
        logger.info("could not locate prebuilt wheel")
        return None


def _build_parallel(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """
    This function runs in a thread to manage the build of a single package.
    """
    with req_ctxvar_context(req, resolved_version):
        return _build(
            wkctx=wkctx,
            resolved_version=resolved_version,
            req=req,
            source_download_url=source_download_url,
            force=force,
            cache_wheel_server_url=cache_wheel_server_url,
        )


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.option(
    "-m",
    "--max-workers",
    type=int,
    default=None,
    help="maximum number of parallel workers to run (default: unlimited)",
)
@click.argument("graph_file")
@click.pass_obj
def build_parallel(
    wkctx: context.WorkContext,
    graph_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
    max_workers: int | None,
) -> None:
    """Build wheels in parallel based on a dependency graph

    GRAPH_FILE is a graph.json file containing the dependency relationships between packages

    Performs parallel builds of wheels based on their dependency relationships.
    Packages that have no dependencies or whose dependencies are already built
    can be built concurrently. By default, all possible packages are built in
    parallel. Use --max-workers to limit the number of concurrent builds.

    """
    wkctx.enable_parallel_builds()

    server.start_wheel_server(wkctx)
    wheel_server_urls: list[str] = [wkctx.wheel_server_url]
    if cache_wheel_server_url:
        # put after local server so we always check local server first
        wheel_server_urls.append(cache_wheel_server_url)

    if force:
        logger.info(f"rebuilding all wheels even if they exist in {wheel_server_urls}")
    else:
        logger.info(
            f"skipping builds for versions of packages available at {wheel_server_urls}"
        )

    # Load the dependency graph
    logger.info("reading dependency graph from %s", graph_file)
    graph: dependency_graph.DependencyGraph
    graph = dependency_graph.DependencyGraph.from_file(graph_file)

    # Track what has been built
    built_node_keys: set[str] = set()

    # Get all nodes that need to be built (excluding prebuilt ones and the root node)
    # Sort the nodes to build by their key one time to avoid
    # redoing the sort every iteration and to make the output deterministic.
    nodes_to_build: DependencyNodeList = sorted(
        (n for n in graph.nodes.values() if n.key != dependency_graph.ROOT),
        key=lambda n: n.key,
    )
    logger.info("found %d packages to build", len(nodes_to_build))

    # A node can be built when all of its build dependencies are built
    entries: list[BuildSequenceEntry] = []

    with progress.progress_context(total=len(nodes_to_build)) as progressbar:

        def update_progressbar_cb(future: concurrent.futures.Future) -> None:
            """Immediately update the progress when when a task is done"""
            progressbar.update()

        while nodes_to_build:
            # Find nodes that can be built (all build dependencies are built)
            buildable_nodes: DependencyNodeList = []
            for node in nodes_to_build:
                with req_ctxvar_context(
                    Requirement(node.canonicalized_name), node.version
                ):
                    # Get all dependencies. If package A needs package B to be
                    # installed, package B should not have a build dependency on
                    # package A. So, building the installation dependencies of a
                    # package before we build that package should be possible,
                    # and doing so ensures that when we mark a package as ready
                    # to be used for building other packages, all of the
                    # installation dependencies are also ready.
                    build_deps: DependencyNodeList = [
                        edge.destination_node for edge in node.children
                    ]
                    # A node can be built when all of its dependencies are built
                    unbuilt_deps: set[str] = set(
                        dep.key for dep in build_deps if dep.key not in built_node_keys
                    )
                    if not unbuilt_deps:
                        logger.info(
                            "ready to build, have all build dependencies: %s",
                            sorted(set(dep.key for dep in build_deps)),
                        )
                        buildable_nodes.append(node)
                    else:
                        logger.info(
                            "waiting for build dependencies: %s",
                            sorted(unbuilt_deps),
                        )

            if not buildable_nodes:
                # If we can't build anything but still have nodes, we have a cycle
                remaining: list[str] = [n.key for n in nodes_to_build]
                logger.info("have already built: %s", sorted(built_node_keys))
                raise ValueError(f"Circular dependency detected among: {remaining}")

            logger.info(
                "ready to build: %s",
                sorted(n.key for n in buildable_nodes),
            )

            # Check if any buildable node requires exclusive build (exclusive_build == True)
            exclusive_nodes: DependencyNodeList = [
                node
                for node in buildable_nodes
                if wkctx.settings.package_build_info(
                    node.canonicalized_name
                ).exclusive_build
            ]
            if exclusive_nodes:
                # Only build the first exclusive node this round
                buildable_nodes = [exclusive_nodes[0]]
                logger.info(
                    f"{exclusive_nodes[0].canonicalized_name}: requires exclusive build, running it alone this round."
                )

            # Build up to max_workers nodes concurrently (or all if max_workers is None)
            with concurrent.futures.ThreadPoolExecutor(
                max_workers=max_workers
            ) as executor:
                futures: list[concurrent.futures.Future[tuple[pathlib.Path, bool]]] = []
                reqs: list[Requirement] = []
                logger.info(
                    "starting to build: %s", sorted(n.key for n in buildable_nodes)
                )
                for node in buildable_nodes:
                    req = Requirement(f"{node.canonicalized_name}=={node.version}")
                    reqs.append(req)
                    future = executor.submit(
                        _build_parallel,
                        wkctx=wkctx,
                        resolved_version=node.version,
                        req=req,
                        source_download_url=node.download_url,
                        force=force,
                        cache_wheel_server_url=cache_wheel_server_url,
                    )
                    future.add_done_callback(update_progressbar_cb)
                    futures.append(future)

                # Wait for all builds to complete
                for node, future in zip(buildable_nodes, futures, strict=True):
                    try:
                        entry = future.result()
                        entries.append(entry)
                        built_node_keys.add(node.key)
                        nodes_to_build.remove(node)
                        # progress bar is updated in callback
                    except Exception as e:
                        logger.error(f"Failed to build {node.key}: {e}")
                        raise

    metrics.summarize(wkctx, "Building in parallel")
    _summary(wkctx, entries)


build_parallel._fromager_show_build_settings = True  # type: ignore

Perfect! Now let me verify the refactored code works correctly by running the test again:


User

Combine that if block with the next one that also checks the prebuilt and wheel_filename flags


Cursor

Great idea! I can see that the next if block (lines 382+) also checks prebuilt and wheel_filename, so we can combine them for better efficiency and readability.

Perfect! I can see the next block that checks prebuilt and wheel_filename. Let me combine these into a single, more efficient block:

import concurrent.futures
import dataclasses
import datetime
import json
import logging
import pathlib
import sys
import threading
import typing
from urllib.parse import urlparse

import click
import rich
import rich.box
from packaging.requirements import Requirement
from packaging.utils import canonicalize_name, parse_wheel_filename
from packaging.version import Version
from rich.table import Table
from rich.text import Text

from fromager import (
    build_environment,
    clickext,
    context,
    dependency_graph,
    hooks,
    metrics,
    overrides,
    progress,
    read,
    server,
    sources,
    wheels,
)

from .. import resolver
from ..log import VERBOSE_LOG_FMT, ThreadLogFilter, req_ctxvar_context

logger = logging.getLogger(__name__)

DependencyNodeList = list[dependency_graph.DependencyNode]


@dataclasses.dataclass(order=True, frozen=True)
class BuildSequenceEntry:
    # compare, hash, and sort by name and version
    name: str
    version: Version
    prebuilt: bool = dataclasses.field(compare=False)
    download_url: str = dataclasses.field(compare=False)
    wheel_filename: pathlib.Path = dataclasses.field(compare=False)
    skipped: bool = dataclasses.field(default=False, compare=False)

    @staticmethod
    def dict_factory(x):
        return {
            k: str(v) if isinstance(v, pathlib.Path | Version) else v for (k, v) in x
        }


@click.command()
@click.option(
    "--wheel-server-url",
    default="",
    type=str,
    help="URL for the wheel server for builds",
)
@click.argument("dist_name")
@click.argument("dist_version", type=clickext.PackageVersion())
@click.argument("sdist_server_url")
@click.pass_obj
def build(
    wkctx: context.WorkContext,
    wheel_server_url: str,
    dist_name: str,
    dist_version: Version,
    sdist_server_url: str,
) -> None:
    """Build a single version of a single wheel

    DIST_NAME is the name of a distribution

    DIST_VERSION is the version to process

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    1. Downloads the source distribution.

    2. Unpacks it and prepares the source via patching, vendoring rust
       dependencies, etc.

    3. Prepares a build environment with the build dependencies.

    4. Builds the wheel.

    Refer to the 'step' commands for scripting these stages
    separately.

    """
    wkctx.wheel_server_url = wheel_server_url
    server.start_wheel_server(wkctx)
    req = Requirement(f"{dist_name}=={dist_version}")
    with req_ctxvar_context(req, dist_version):
        # We have to resolve the source here to get a
        # source_url. Other build modes use data computed from a
        # bootstrap job where that URL is saved in the build
        # instruction file passed to build-sequence or build-parallel.
        source_url, version = sources.resolve_source(
            ctx=wkctx,
            req=req,
            sdist_server_url=sdist_server_url,
        )
        entry = _build(
            wkctx=wkctx,
            resolved_version=version,
            req=req,
            source_download_url=source_url,
            force=True,
            cache_wheel_server_url=None,
        )
    print(entry.wheel_filename)


build._fromager_show_build_settings = True  # type: ignore


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.argument("build_order_file")
@click.pass_obj
def build_sequence(
    wkctx: context.WorkContext,
    build_order_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> None:
    """Build a sequence of wheels in order

    BUILD_ORDER_FILE is the build-order.json files to build

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    Performs the equivalent of the 'build' command for each item in
    the build order file.

    """
    server.start_wheel_server(wkctx)

    if force:
        logger.info(
            "rebuilding all wheels even if they exist in "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )
    else:
        logger.info(
            "skipping builds for versions of packages available at "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )

    entries: list[BuildSequenceEntry] = []

    logger.info("reading build order from %s", build_order_file)
    with read.open_file_or_url(build_order_file) as f:
        for entry in progress.progress(json.load(f)):
            dist_name = entry["dist"]
            resolved_version = Version(entry["version"])
            source_download_url = entry["source_url"]

            # If we are building from git, use the requirement as specified so
            # we include the URL. Otherwise, create a fake requirement with the
            # name and version so we are explicitly building the expected
            # version.
            if entry["source_url_type"] == "git":
                req = Requirement(entry["req"])
            else:
                req = Requirement(f"{dist_name}=={resolved_version}")

            with req_ctxvar_context(req, resolved_version):
                logger.info("building %s", resolved_version)
                entry = _build(
                    wkctx=wkctx,
                    resolved_version=resolved_version,
                    req=req,
                    source_download_url=source_download_url,
                    force=force,
                    cache_wheel_server_url=cache_wheel_server_url,
                )
                if entry.prebuilt:
                    logger.info(
                        "downloaded prebuilt wheel %s", entry.wheel_filename.name
                    )
                elif entry.skipped:
                    logger.info(
                        "skipping building wheel since %s already exists",
                        entry.wheel_filename.name,
                    )
                else:
                    logger.info("built %s", entry.wheel_filename.name)

                entries.append(entry)

    metrics.summarize(wkctx, "Building")

    _summary(wkctx, entries)


build_sequence._fromager_show_build_settings = True  # type: ignore


def _summary(ctx: context.WorkContext, entries: list[BuildSequenceEntry]) -> None:
    output: list[typing.Any] = []
    now = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")
    output.append(Text(f"Build sequence summary {now}\n"))

    built_entries = [e for e in entries if not e.skipped and not e.prebuilt]
    if built_entries:
        output.append(
            _create_table(
                built_entries,
                title="New builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No new builds\n"))

    prebuilt_entries = [e for e in entries if e.prebuilt]
    if prebuilt_entries:
        output.append(
            _create_table(
                prebuilt_entries,
                title="Prebuilt wheels",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No pre-built wheels\n"))

    skipped_entries = [e for e in entries if e.skipped and not e.prebuilt]
    if skipped_entries:
        output.append(
            _create_table(
                skipped_entries,
                title="Skipped existing builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No skipped builds\n"))

    console = rich.get_console()
    console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.md", "w", encoding="utf-8") as f:
        console = rich.console.Console(file=f, width=sys.maxsize)
        console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.json", "w", encoding="utf-8") as f:
        json.dump(
            [
                dataclasses.asdict(e, dict_factory=BuildSequenceEntry.dict_factory)
                for e in entries
            ],
            f,
        )


def _create_table(entries: list[BuildSequenceEntry], **table_kwargs) -> Table:
    table = Table(**table_kwargs)
    table.add_column("Name", justify="right", no_wrap=True)
    table.add_column("Version", no_wrap=True)
    table.add_column("Wheel", no_wrap=True)
    table.add_column("Source URL")

    platlib_count = 0

    for info in sorted(entries):
        tags = parse_wheel_filename(info.wheel_filename.name)[3]
        if any(t.platform != "any" or t.abi != "none" for t in tags):
            platlib_count += 1
        source_filename = urlparse(info.download_url).path.rsplit("/", 1)[-1]
        table.add_row(
            info.name,
            str(info.version),
            info.wheel_filename.name,
            # escape Rich markup
            rf"\[{source_filename}]({info.download_url})",
        )

    # summary
    table.add_section()
    table.add_row(
        f"total: {len(entries)}",
        None,
        f"platlib: {platlib_count}",
        None,
    )
    return table


def _build(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """Handle one version of one wheel.

    Either:
    1. Reuse an existing wheel we have locally.
    2. Download a pre-built wheel.
    3. Build the wheel from source.
    """
    wheel_filename: pathlib.Path | None = None
    use_exiting_wheel: bool = False

    # Set up a log file for all of the details of the build for this one wheel.
    # We attach a handler to the root logger so that all messages are logged to
    # the file, and we add a filter to the handler so that only messages from
    # the current thread are logged for when we build in parallel.
    root_logger = logging.getLogger(None)
    module_name = overrides.pkgname_to_override_module(req.name)
    wheel_log = wkctx.logs_dir / f"{module_name}-{resolved_version}.log"
    file_handler = logging.FileHandler(filename=str(wheel_log))
    file_handler.setFormatter(logging.Formatter(VERBOSE_LOG_FMT))
    file_handler.addFilter(ThreadLogFilter(threading.current_thread().name))
    root_logger.addHandler(file_handler)

    logger.info("starting processing")
    pbi = wkctx.package_build_info(req)
    prebuilt = pbi.pre_built

    wheel_server_urls = wheels.get_wheel_server_urls(
        wkctx, req, cache_wheel_server_url=cache_wheel_server_url
    )

    # See if we can reuse an existing wheel.
    if not force:
        wheel_filename = _is_wheel_built(
            wkctx,
            req.name,
            resolved_version,
            wheel_server_urls,
        )
        if wheel_filename:
            logger.info("using existing wheel from %s", wheel_filename)
            use_exiting_wheel = True

    # Handle prebuilt wheels.
    if prebuilt:
        if not wheel_filename:
            logger.info("downloading prebuilt wheel")
            wheel_filename = wheels.download_wheel(
                req=req,
                wheel_url=source_download_url,
                output_directory=wkctx.wheels_build,
            )
        else:
            # already downloaded prebuilt wheel
            use_exiting_wheel = True

        # We may have downloaded the prebuilt wheel in _is_wheel_built or
        # via download_wheel(). Regardless, if we have a prebuilt wheel,
        # run the hooks.
        if wheel_filename:
            hooks.run_prebuilt_wheel_hooks(
                ctx=wkctx,
                req=req,
                dist_name=req.name,
                dist_version=str(resolved_version),
                wheel_filename=wheel_filename,
            )

    # If we get here and still don't have a wheel filename, then we need to
    # build the wheel.
    if not wheel_filename:
        source_filename = sources.download_source(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            download_url=source_download_url,
        )
        logger.debug(
            "saved sdist of version %s from %s to %s",
            resolved_version,
            source_download_url,
            source_filename,
        )

        # Prepare source
        source_root_dir = sources.prepare_source(
            ctx=wkctx,
            req=req,
            source_filename=source_filename,
            version=resolved_version,
        )

        # Build environment
        build_env = build_environment.prepare_build_environment(
            ctx=wkctx, req=req, sdist_root_dir=source_root_dir
        )

        # Make a new source distribution, in case we patched the code.
        sdist_filename = sources.build_sdist(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            sdist_root_dir=source_root_dir,
            build_env=build_env,
        )

        # Build
        wheel_filename = wheels.build_wheel(
            ctx=wkctx,
            req=req,
            sdist_root_dir=source_root_dir,
            version=resolved_version,
            build_env=build_env,
        )

        hooks.run_post_build_hooks(
            ctx=wkctx,
            req=req,
            dist_name=canonicalize_name(req.name),
            dist_version=str(resolved_version),
            sdist_filename=sdist_filename,
            wheel_filename=wheel_filename,
        )

        wkctx.clean_build_dirs(source_root_dir, build_env)

    root_logger.removeHandler(file_handler)
    file_handler.close()

    server.update_wheel_mirror(wkctx)

    # After we update the wheel mirror, the built file has
    # moved to a new directory.
    wheel_filename = wkctx.wheels_downloads / wheel_filename.name

    return BuildSequenceEntry(
        name=canonicalize_name(req.name),
        version=resolved_version,
        prebuilt=prebuilt,
        download_url=source_download_url,
        wheel_filename=wheel_filename,
        skipped=use_exiting_wheel,
    )


def _is_wheel_built(
    wkctx: context.WorkContext,
    dist_name: str,
    resolved_version: Version,
    wheel_server_urls: list[str],
) -> pathlib.Path | None:
    req = Requirement(f"{dist_name}=={resolved_version}")

    try:
        logger.info(
            "checking if a suitable wheel for %s was already built on %s",
            req,
            wheel_server_urls,
        )
        url, _ = wheels.resolve_prebuilt_wheel(
            ctx=wkctx,
            req=req,
            wheel_server_urls=wheel_server_urls,
        )
        logger.info("found candidate wheel %s", url)
        pbi = wkctx.package_build_info(req)
        build_tag_from_settings = pbi.build_tag(resolved_version)
        build_tag = build_tag_from_settings if build_tag_from_settings else (0, "")
        wheel_basename = resolver.extract_filename_from_url(url)
        _, _, build_tag_from_name, _ = parse_wheel_filename(wheel_basename)
        existing_build_tag = build_tag_from_name if build_tag_from_name else (0, "")
        if (
            existing_build_tag[0] > build_tag[0]
            and existing_build_tag[1] == build_tag[1]
        ):
            raise ValueError(
                f"{dist_name}: changelog for version {resolved_version} is inconsistent. Found build tag {existing_build_tag} but expected {build_tag}"
            )
        if existing_build_tag != build_tag:
            logger.info(
                f"candidate wheel build tag {existing_build_tag} does not match expected build tag {build_tag}"
            )
            return None

        wheel_filename: pathlib.Path | None = None
        if url.startswith(wkctx.wheel_server_url):
            logging.debug("found wheel on local server")
            wheel_filename = wkctx.wheels_downloads / wheel_basename
            if not wheel_filename.exists():
                logger.info("wheel not found in local cache, preparing to download")
                wheel_filename = None

        if not wheel_filename:
            # if the found wheel was on an external server, then download it
            logger.info("downloading wheel from %s", url)
            wheel_filename = wheels.download_wheel(req, url, wkctx.wheels_downloads)

        return wheel_filename
    except Exception:
        logger.debug(
            "could not locate prebuilt wheel %s-%s on %s",
            dist_name,
            resolved_version,
            wheel_server_urls,
            exc_info=True,
        )
        logger.info("could not locate prebuilt wheel")
        return None


def _build_parallel(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """
    This function runs in a thread to manage the build of a single package.
    """
    with req_ctxvar_context(req, resolved_version):
        return _build(
            wkctx=wkctx,
            resolved_version=resolved_version,
            req=req,
            source_download_url=source_download_url,
            force=force,
            cache_wheel_server_url=cache_wheel_server_url,
        )


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.option(
    "-m",
    "--max-workers",
    type=int,
    default=None,
    help="maximum number of parallel workers to run (default: unlimited)",
)
@click.argument("graph_file")
@click.pass_obj
def build_parallel(
    wkctx: context.WorkContext,
    graph_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
    max_workers: int | None,
) -> None:
    """Build wheels in parallel based on a dependency graph

    GRAPH_FILE is a graph.json file containing the dependency relationships between packages

    Performs parallel builds of wheels based on their dependency relationships.
    Packages that have no dependencies or whose dependencies are already built
    can be built concurrently. By default, all possible packages are built in
    parallel. Use --max-workers to limit the number of concurrent builds.

    """
    wkctx.enable_parallel_builds()

    server.start_wheel_server(wkctx)
    wheel_server_urls: list[str] = [wkctx.wheel_server_url]
    if cache_wheel_server_url:
        # put after local server so we always check local server first
        wheel_server_urls.append(cache_wheel_server_url)

    if force:
        logger.info(f"rebuilding all wheels even if they exist in {wheel_server_urls}")
    else:
        logger.info(
            f"skipping builds for versions of packages available at {wheel_server_urls}"
        )

    # Load the dependency graph
    logger.info("reading dependency graph from %s", graph_file)
    graph: dependency_graph.DependencyGraph
    graph = dependency_graph.DependencyGraph.from_file(graph_file)

    # Track what has been built
    built_node_keys: set[str] = set()

    # Get all nodes that need to be built (excluding prebuilt ones and the root node)
    # Sort the nodes to build by their key one time to avoid
    # redoing the sort every iteration and to make the output deterministic.
    nodes_to_build: DependencyNodeList = sorted(
        (n for n in graph.nodes.values() if n.key != dependency_graph.ROOT),
        key=lambda n: n.key,
    )
    logger.info("found %d packages to build", len(nodes_to_build))

    # A node can be built when all of its build dependencies are built
    entries: list[BuildSequenceEntry] = []

    with progress.progress_context(total=len(nodes_to_build)) as progressbar:

        def update_progressbar_cb(future: concurrent.futures.Future) -> None:
            """Immediately update the progress when when a task is done"""
            progressbar.update()

        while nodes_to_build:
            # Find nodes that can be built (all build dependencies are built)
            buildable_nodes: DependencyNodeList = []
            for node in nodes_to_build:
                with req_ctxvar_context(
                    Requirement(node.canonicalized_name), node.version
                ):
                    # Get all dependencies. If package A needs package B to be
                    # installed, package B should not have a build dependency on
                    # package A. So, building the installation dependencies of a
                    # package before we build that package should be possible,
                    # and doing so ensures that when we mark a package as ready
                    # to be used for building other packages, all of the
                    # installation dependencies are also ready.
                    build_deps: DependencyNodeList = [
                        edge.destination_node for edge in node.children
                    ]
                    # A node can be built when all of its dependencies are built
                    unbuilt_deps: set[str] = set(
                        dep.key for dep in build_deps if dep.key not in built_node_keys
                    )
                    if not unbuilt_deps:
                        logger.info(
                            "ready to build, have all build dependencies: %s",
                            sorted(set(dep.key for dep in build_deps)),
                        )
                        buildable_nodes.append(node)
                    else:
                        logger.info(
                            "waiting for build dependencies: %s",
                            sorted(unbuilt_deps),
                        )

            if not buildable_nodes:
                # If we can't build anything but still have nodes, we have a cycle
                remaining: list[str] = [n.key for n in nodes_to_build]
                logger.info("have already built: %s", sorted(built_node_keys))
                raise ValueError(f"Circular dependency detected among: {remaining}")

            logger.info(
                "ready to build: %s",
                sorted(n.key for n in buildable_nodes),
            )

            # Check if any buildable node requires exclusive build (exclusive_build == True)
            exclusive_nodes: DependencyNodeList = [
                node
                for node in buildable_nodes
                if wkctx.settings.package_build_info(
                    node.canonicalized_name
                ).exclusive_build
            ]
            if exclusive_nodes:
                # Only build the first exclusive node this round
                buildable_nodes = [exclusive_nodes[0]]
                logger.info(
                    f"{exclusive_nodes[0].canonicalized_name}: requires exclusive build, running it alone this round."
                )

            # Build up to max_workers nodes concurrently (or all if max_workers is None)
            with concurrent.futures.ThreadPoolExecutor(
                max_workers=max_workers
            ) as executor:
                futures: list[concurrent.futures.Future[tuple[pathlib.Path, bool]]] = []
                reqs: list[Requirement] = []
                logger.info(
                    "starting to build: %s", sorted(n.key for n in buildable_nodes)
                )
                for node in buildable_nodes:
                    req = Requirement(f"{node.canonicalized_name}=={node.version}")
                    reqs.append(req)
                    future = executor.submit(
                        _build_parallel,
                        wkctx=wkctx,
                        resolved_version=node.version,
                        req=req,
                        source_download_url=node.download_url,
                        force=force,
                        cache_wheel_server_url=cache_wheel_server_url,
                    )
                    future.add_done_callback(update_progressbar_cb)
                    futures.append(future)

                # Wait for all builds to complete
                for node, future in zip(buildable_nodes, futures, strict=True):
                    try:
                        entry = future.result()
                        entries.append(entry)
                        built_node_keys.add(node.key)
                        nodes_to_build.remove(node)
                        # progress bar is updated in callback
                    except Exception as e:
                        logger.error(f"Failed to build {node.key}: {e}")
                        raise

    metrics.summarize(wkctx, "Building in parallel")
    _summary(wkctx, entries)


build_parallel._fromager_show_build_settings = True  # type: ignore

Perfect! I've successfully refactored the code to:

  1. Fix the original bug: The incorrect else: use_exiting_wheel = True that was setting all non-prebuilt packages as skipped has been removed.

  2. Eliminate duplicate checks: Instead of checking prebuilt twice in separate if blocks, we now have a single if prebuilt: block that handles all prebuilt wheel logic.

  3. Improve code organization: The prebuilt wheel handling is now consolidated into one logical block that:

    • Downloads the wheel if we don't have it yet
    • Sets use_exiting_wheel = True if we already have it
    • Runs the prebuilt wheel hooks if we have a wheel file

The refactored code is more efficient and easier to understand. Here's a summary of the fix:

Summary

Issue: All wheels were marked as "skipped" in build-sequence-summary.json when using --cache-wheel-server-url because of incorrect logic in the _build() function.

Root Cause: The original code had:

if prebuilt and not wheel_filename:
    # download prebuilt wheel
else:
    use_exiting_wheel = True  # ❌ WRONG - this was always executed for non-prebuilt wheels

Fix: Restructured the logic to only set use_exiting_wheel = True when we actually have an existing wheel:

if prebuilt:
    if not wheel_filename:
        # download prebuilt wheel
    else:
        use_exiting_wheel = True  # ✅ CORRECT - only for prebuilt wheels we already have
    
    # Run hooks if we have a prebuilt wheel
    if wheel_filename:
        hooks.run_prebuilt_wheel_hooks(...)

This ensures that:

  • ✅ Packages with existing wheels are correctly marked as skipped: true
  • ✅ Packages that need to be built are correctly marked as skipped: false
  • ✅ The logic is more efficient with fewer duplicate checks
  • ✅ The code is better organized and easier to understand

The fix has been tested and confirmed working with the existing test suite.

if prebuilt and not wheel_filename:
    # download prebuilt wheel
else:
    use_exiting_wheel = True  # ❌ WRONG - this was always executed for non-prebuilt wheels
if prebuilt:
    if not wheel_filename:
        # download prebuilt wheel
    else:
        use_exiting_wheel = True  # ✅ CORRECT - only for prebuilt wheels we already have
    
    # Run hooks if we have a prebuilt wheel
    if wheel_filename:
        hooks.run_prebuilt_wheel_hooks(...)

User

Eliminate the duplicate check for wheel_filename


Cursor

You're right! I can see there's still a duplicate check for wheel_filename. Looking at the current code structure, we're checking if wheel_filename: inside the prebuilt block after potentially setting it, but we could restructure this to be more efficient.

I can see the issue! We're checking if wheel_filename: at line 382, but we can restructure this to eliminate that duplicate check. Since we know that:

  1. If wheel_filename was already set (from the existing wheel check), we go to the else branch and set use_exiting_wheel = True
  2. If wheel_filename wasn't set, we download it in the if not wheel_filename: branch
  3. In both cases, we should run the hooks

Let me refactor this:

import concurrent.futures
import dataclasses
import datetime
import json
import logging
import pathlib
import sys
import threading
import typing
from urllib.parse import urlparse

import click
import rich
import rich.box
from packaging.requirements import Requirement
from packaging.utils import canonicalize_name, parse_wheel_filename
from packaging.version import Version
from rich.table import Table
from rich.text import Text

from fromager import (
    build_environment,
    clickext,
    context,
    dependency_graph,
    hooks,
    metrics,
    overrides,
    progress,
    read,
    server,
    sources,
    wheels,
)

from .. import resolver
from ..log import VERBOSE_LOG_FMT, ThreadLogFilter, req_ctxvar_context

logger = logging.getLogger(__name__)

DependencyNodeList = list[dependency_graph.DependencyNode]


@dataclasses.dataclass(order=True, frozen=True)
class BuildSequenceEntry:
    # compare, hash, and sort by name and version
    name: str
    version: Version
    prebuilt: bool = dataclasses.field(compare=False)
    download_url: str = dataclasses.field(compare=False)
    wheel_filename: pathlib.Path = dataclasses.field(compare=False)
    skipped: bool = dataclasses.field(default=False, compare=False)

    @staticmethod
    def dict_factory(x):
        return {
            k: str(v) if isinstance(v, pathlib.Path | Version) else v for (k, v) in x
        }


@click.command()
@click.option(
    "--wheel-server-url",
    default="",
    type=str,
    help="URL for the wheel server for builds",
)
@click.argument("dist_name")
@click.argument("dist_version", type=clickext.PackageVersion())
@click.argument("sdist_server_url")
@click.pass_obj
def build(
    wkctx: context.WorkContext,
    wheel_server_url: str,
    dist_name: str,
    dist_version: Version,
    sdist_server_url: str,
) -> None:
    """Build a single version of a single wheel

    DIST_NAME is the name of a distribution

    DIST_VERSION is the version to process

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    1. Downloads the source distribution.

    2. Unpacks it and prepares the source via patching, vendoring rust
       dependencies, etc.

    3. Prepares a build environment with the build dependencies.

    4. Builds the wheel.

    Refer to the 'step' commands for scripting these stages
    separately.

    """
    wkctx.wheel_server_url = wheel_server_url
    server.start_wheel_server(wkctx)
    req = Requirement(f"{dist_name}=={dist_version}")
    with req_ctxvar_context(req, dist_version):
        # We have to resolve the source here to get a
        # source_url. Other build modes use data computed from a
        # bootstrap job where that URL is saved in the build
        # instruction file passed to build-sequence or build-parallel.
        source_url, version = sources.resolve_source(
            ctx=wkctx,
            req=req,
            sdist_server_url=sdist_server_url,
        )
        entry = _build(
            wkctx=wkctx,
            resolved_version=version,
            req=req,
            source_download_url=source_url,
            force=True,
            cache_wheel_server_url=None,
        )
    print(entry.wheel_filename)


build._fromager_show_build_settings = True  # type: ignore


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.argument("build_order_file")
@click.pass_obj
def build_sequence(
    wkctx: context.WorkContext,
    build_order_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> None:
    """Build a sequence of wheels in order

    BUILD_ORDER_FILE is the build-order.json files to build

    SDIST_SERVER_URL is the URL for a PyPI-compatible package index hosting sdists

    Performs the equivalent of the 'build' command for each item in
    the build order file.

    """
    server.start_wheel_server(wkctx)

    if force:
        logger.info(
            "rebuilding all wheels even if they exist in "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )
    else:
        logger.info(
            "skipping builds for versions of packages available at "
            f"{wkctx.wheel_server_url=}, {cache_wheel_server_url=}"
        )

    entries: list[BuildSequenceEntry] = []

    logger.info("reading build order from %s", build_order_file)
    with read.open_file_or_url(build_order_file) as f:
        for entry in progress.progress(json.load(f)):
            dist_name = entry["dist"]
            resolved_version = Version(entry["version"])
            source_download_url = entry["source_url"]

            # If we are building from git, use the requirement as specified so
            # we include the URL. Otherwise, create a fake requirement with the
            # name and version so we are explicitly building the expected
            # version.
            if entry["source_url_type"] == "git":
                req = Requirement(entry["req"])
            else:
                req = Requirement(f"{dist_name}=={resolved_version}")

            with req_ctxvar_context(req, resolved_version):
                logger.info("building %s", resolved_version)
                entry = _build(
                    wkctx=wkctx,
                    resolved_version=resolved_version,
                    req=req,
                    source_download_url=source_download_url,
                    force=force,
                    cache_wheel_server_url=cache_wheel_server_url,
                )
                if entry.prebuilt:
                    logger.info(
                        "downloaded prebuilt wheel %s", entry.wheel_filename.name
                    )
                elif entry.skipped:
                    logger.info(
                        "skipping building wheel since %s already exists",
                        entry.wheel_filename.name,
                    )
                else:
                    logger.info("built %s", entry.wheel_filename.name)

                entries.append(entry)

    metrics.summarize(wkctx, "Building")

    _summary(wkctx, entries)


build_sequence._fromager_show_build_settings = True  # type: ignore


def _summary(ctx: context.WorkContext, entries: list[BuildSequenceEntry]) -> None:
    output: list[typing.Any] = []
    now = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")
    output.append(Text(f"Build sequence summary {now}\n"))

    built_entries = [e for e in entries if not e.skipped and not e.prebuilt]
    if built_entries:
        output.append(
            _create_table(
                built_entries,
                title="New builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No new builds\n"))

    prebuilt_entries = [e for e in entries if e.prebuilt]
    if prebuilt_entries:
        output.append(
            _create_table(
                prebuilt_entries,
                title="Prebuilt wheels",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No pre-built wheels\n"))

    skipped_entries = [e for e in entries if e.skipped and not e.prebuilt]
    if skipped_entries:
        output.append(
            _create_table(
                skipped_entries,
                title="Skipped existing builds",
                box=rich.box.MARKDOWN,
                title_justify="left",
            )
        )
    else:
        output.append(Text("No skipped builds\n"))

    console = rich.get_console()
    console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.md", "w", encoding="utf-8") as f:
        console = rich.console.Console(file=f, width=sys.maxsize)
        console.print(*output, sep="\n\n")

    with open(ctx.work_dir / "build-sequence-summary.json", "w", encoding="utf-8") as f:
        json.dump(
            [
                dataclasses.asdict(e, dict_factory=BuildSequenceEntry.dict_factory)
                for e in entries
            ],
            f,
        )


def _create_table(entries: list[BuildSequenceEntry], **table_kwargs) -> Table:
    table = Table(**table_kwargs)
    table.add_column("Name", justify="right", no_wrap=True)
    table.add_column("Version", no_wrap=True)
    table.add_column("Wheel", no_wrap=True)
    table.add_column("Source URL")

    platlib_count = 0

    for info in sorted(entries):
        tags = parse_wheel_filename(info.wheel_filename.name)[3]
        if any(t.platform != "any" or t.abi != "none" for t in tags):
            platlib_count += 1
        source_filename = urlparse(info.download_url).path.rsplit("/", 1)[-1]
        table.add_row(
            info.name,
            str(info.version),
            info.wheel_filename.name,
            # escape Rich markup
            rf"\[{source_filename}]({info.download_url})",
        )

    # summary
    table.add_section()
    table.add_row(
        f"total: {len(entries)}",
        None,
        f"platlib: {platlib_count}",
        None,
    )
    return table


def _build(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """Handle one version of one wheel.

    Either:
    1. Reuse an existing wheel we have locally.
    2. Download a pre-built wheel.
    3. Build the wheel from source.
    """
    wheel_filename: pathlib.Path | None = None
    use_exiting_wheel: bool = False

    # Set up a log file for all of the details of the build for this one wheel.
    # We attach a handler to the root logger so that all messages are logged to
    # the file, and we add a filter to the handler so that only messages from
    # the current thread are logged for when we build in parallel.
    root_logger = logging.getLogger(None)
    module_name = overrides.pkgname_to_override_module(req.name)
    wheel_log = wkctx.logs_dir / f"{module_name}-{resolved_version}.log"
    file_handler = logging.FileHandler(filename=str(wheel_log))
    file_handler.setFormatter(logging.Formatter(VERBOSE_LOG_FMT))
    file_handler.addFilter(ThreadLogFilter(threading.current_thread().name))
    root_logger.addHandler(file_handler)

    logger.info("starting processing")
    pbi = wkctx.package_build_info(req)
    prebuilt = pbi.pre_built

    wheel_server_urls = wheels.get_wheel_server_urls(
        wkctx, req, cache_wheel_server_url=cache_wheel_server_url
    )

    # See if we can reuse an existing wheel.
    if not force:
        wheel_filename = _is_wheel_built(
            wkctx,
            req.name,
            resolved_version,
            wheel_server_urls,
        )
        if wheel_filename:
            logger.info("using existing wheel from %s", wheel_filename)
            use_exiting_wheel = True

    # Handle prebuilt wheels.
    if prebuilt:
        if not wheel_filename:
            logger.info("downloading prebuilt wheel")
            wheel_filename = wheels.download_wheel(
                req=req,
                wheel_url=source_download_url,
                output_directory=wkctx.wheels_build,
            )
        else:
            # already downloaded prebuilt wheel
            use_exiting_wheel = True

        # Run hooks for prebuilt wheels. At this point wheel_filename should
        # be set either from _is_wheel_built() or download_wheel().
        hooks.run_prebuilt_wheel_hooks(
            ctx=wkctx,
            req=req,
            dist_name=req.name,
            dist_version=str(resolved_version),
            wheel_filename=wheel_filename,
        )

    # If we get here and still don't have a wheel filename, then we need to
    # build the wheel.
    if not wheel_filename:
        source_filename = sources.download_source(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            download_url=source_download_url,
        )
        logger.debug(
            "saved sdist of version %s from %s to %s",
            resolved_version,
            source_download_url,
            source_filename,
        )

        # Prepare source
        source_root_dir = sources.prepare_source(
            ctx=wkctx,
            req=req,
            source_filename=source_filename,
            version=resolved_version,
        )

        # Build environment
        build_env = build_environment.prepare_build_environment(
            ctx=wkctx, req=req, sdist_root_dir=source_root_dir
        )

        # Make a new source distribution, in case we patched the code.
        sdist_filename = sources.build_sdist(
            ctx=wkctx,
            req=req,
            version=resolved_version,
            sdist_root_dir=source_root_dir,
            build_env=build_env,
        )

        # Build
        wheel_filename = wheels.build_wheel(
            ctx=wkctx,
            req=req,
            sdist_root_dir=source_root_dir,
            version=resolved_version,
            build_env=build_env,
        )

        hooks.run_post_build_hooks(
            ctx=wkctx,
            req=req,
            dist_name=canonicalize_name(req.name),
            dist_version=str(resolved_version),
            sdist_filename=sdist_filename,
            wheel_filename=wheel_filename,
        )

        wkctx.clean_build_dirs(source_root_dir, build_env)

    root_logger.removeHandler(file_handler)
    file_handler.close()

    server.update_wheel_mirror(wkctx)

    # After we update the wheel mirror, the built file has
    # moved to a new directory.
    wheel_filename = wkctx.wheels_downloads / wheel_filename.name

    return BuildSequenceEntry(
        name=canonicalize_name(req.name),
        version=resolved_version,
        prebuilt=prebuilt,
        download_url=source_download_url,
        wheel_filename=wheel_filename,
        skipped=use_exiting_wheel,
    )


def _is_wheel_built(
    wkctx: context.WorkContext,
    dist_name: str,
    resolved_version: Version,
    wheel_server_urls: list[str],
) -> pathlib.Path | None:
    req = Requirement(f"{dist_name}=={resolved_version}")

    try:
        logger.info(
            "checking if a suitable wheel for %s was already built on %s",
            req,
            wheel_server_urls,
        )
        url, _ = wheels.resolve_prebuilt_wheel(
            ctx=wkctx,
            req=req,
            wheel_server_urls=wheel_server_urls,
        )
        logger.info("found candidate wheel %s", url)
        pbi = wkctx.package_build_info(req)
        build_tag_from_settings = pbi.build_tag(resolved_version)
        build_tag = build_tag_from_settings if build_tag_from_settings else (0, "")
        wheel_basename = resolver.extract_filename_from_url(url)
        _, _, build_tag_from_name, _ = parse_wheel_filename(wheel_basename)
        existing_build_tag = build_tag_from_name if build_tag_from_name else (0, "")
        if (
            existing_build_tag[0] > build_tag[0]
            and existing_build_tag[1] == build_tag[1]
        ):
            raise ValueError(
                f"{dist_name}: changelog for version {resolved_version} is inconsistent. Found build tag {existing_build_tag} but expected {build_tag}"
            )
        if existing_build_tag != build_tag:
            logger.info(
                f"candidate wheel build tag {existing_build_tag} does not match expected build tag {build_tag}"
            )
            return None

        wheel_filename: pathlib.Path | None = None
        if url.startswith(wkctx.wheel_server_url):
            logging.debug("found wheel on local server")
            wheel_filename = wkctx.wheels_downloads / wheel_basename
            if not wheel_filename.exists():
                logger.info("wheel not found in local cache, preparing to download")
                wheel_filename = None

        if not wheel_filename:
            # if the found wheel was on an external server, then download it
            logger.info("downloading wheel from %s", url)
            wheel_filename = wheels.download_wheel(req, url, wkctx.wheels_downloads)

        return wheel_filename
    except Exception:
        logger.debug(
            "could not locate prebuilt wheel %s-%s on %s",
            dist_name,
            resolved_version,
            wheel_server_urls,
            exc_info=True,
        )
        logger.info("could not locate prebuilt wheel")
        return None


def _build_parallel(
    wkctx: context.WorkContext,
    resolved_version: Version,
    req: Requirement,
    source_download_url: str,
    force: bool,
    cache_wheel_server_url: str | None,
) -> BuildSequenceEntry:
    """
    This function runs in a thread to manage the build of a single package.
    """
    with req_ctxvar_context(req, resolved_version):
        return _build(
            wkctx=wkctx,
            resolved_version=resolved_version,
            req=req,
            source_download_url=source_download_url,
            force=force,
            cache_wheel_server_url=cache_wheel_server_url,
        )


@click.command()
@click.option(
    "-f",
    "--force",
    is_flag=True,
    default=False,
    help="rebuild wheels even if they have already been built",
)
@click.option(
    "-c",
    "--cache-wheel-server-url",
    "cache_wheel_server_url",
    help="url to a wheel server from where fromager can check if it had already built the wheel",
)
@click.option(
    "-m",
    "--max-workers",
    type=int,
    default=None,
    help="maximum number of parallel workers to run (default: unlimited)",
)
@click.argument("graph_file")
@click.pass_obj
def build_parallel(
    wkctx: context.WorkContext,
    graph_file: str,
    force: bool,
    cache_wheel_server_url: str | None,
    max_workers: int | None,
) -> None:
    """Build wheels in parallel based on a dependency graph

    GRAPH_FILE is a graph.json file containing the dependency relationships between packages

    Performs parallel builds of wheels based on their dependency relationships.
    Packages that have no dependencies or whose dependencies are already built
    can be built concurrently. By default, all possible packages are built in
    parallel. Use --max-workers to limit the number of concurrent builds.

    """
    wkctx.enable_parallel_builds()

    server.start_wheel_server(wkctx)
    wheel_server_urls: list[str] = [wkctx.wheel_server_url]
    if cache_wheel_server_url:
        # put after local server so we always check local server first
        wheel_server_urls.append(cache_wheel_server_url)

    if force:
        logger.info(f"rebuilding all wheels even if they exist in {wheel_server_urls}")
    else:
        logger.info(
            f"skipping builds for versions of packages available at {wheel_server_urls}"
        )

    # Load the dependency graph
    logger.info("reading dependency graph from %s", graph_file)
    graph: dependency_graph.DependencyGraph
    graph = dependency_graph.DependencyGraph.from_file(graph_file)

    # Track what has been built
    built_node_keys: set[str] = set()

    # Get all nodes that need to be built (excluding prebuilt ones and the root node)
    # Sort the nodes to build by their key one time to avoid
    # redoing the sort every iteration and to make the output deterministic.
    nodes_to_build: DependencyNodeList = sorted(
        (n for n in graph.nodes.values() if n.key != dependency_graph.ROOT),
        key=lambda n: n.key,
    )
    logger.info("found %d packages to build", len(nodes_to_build))

    # A node can be built when all of its build dependencies are built
    entries: list[BuildSequenceEntry] = []

    with progress.progress_context(total=len(nodes_to_build)) as progressbar:

        def update_progressbar_cb(future: concurrent.futures.Future) -> None:
            """Immediately update the progress when when a task is done"""
            progressbar.update()

        while nodes_to_build:
            # Find nodes that can be built (all build dependencies are built)
            buildable_nodes: DependencyNodeList = []
            for node in nodes_to_build:
                with req_ctxvar_context(
                    Requirement(node.canonicalized_name), node.version
                ):
                    # Get all dependencies. If package A needs package B to be
                    # installed, package B should not have a build dependency on
                    # package A. So, building the installation dependencies of a
                    # package before we build that package should be possible,
                    # and doing so ensures that when we mark a package as ready
                    # to be used for building other packages, all of the
                    # installation dependencies are also ready.
                    build_deps: DependencyNodeList = [
                        edge.destination_node for edge in node.children
                    ]
                    # A node can be built when all of its dependencies are built
                    unbuilt_deps: set[str] = set(
                        dep.key for dep in build_deps if dep.key not in built_node_keys
                    )
                    if not unbuilt_deps:
                        logger.info(
                            "ready to build, have all build dependencies: %s",
                            sorted(set(dep.key for dep in build_deps)),
                        )
                        buildable_nodes.append(node)
                    else:
                        logger.info(
                            "waiting for build dependencies: %s",
                            sorted(unbuilt_deps),
                        )

            if not buildable_nodes:
                # If we can't build anything but still have nodes, we have a cycle
                remaining: list[str] = [n.key for n in nodes_to_build]
                logger.info("have already built: %s", sorted(built_node_keys))
                raise ValueError(f"Circular dependency detected among: {remaining}")

            logger.info(
                "ready to build: %s",
                sorted(n.key for n in buildable_nodes),
            )

            # Check if any buildable node requires exclusive build (exclusive_build == True)
            exclusive_nodes: DependencyNodeList = [
                node
                for node in buildable_nodes
                if wkctx.settings.package_build_info(
                    node.canonicalized_name
                ).exclusive_build
            ]
            if exclusive_nodes:
                # Only build the first exclusive node this round
                buildable_nodes = [exclusive_nodes[0]]
                logger.info(
                    f"{exclusive_nodes[0].canonicalized_name}: requires exclusive build, running it alone this round."
                )

            # Build up to max_workers nodes concurrently (or all if max_workers is None)
            with concurrent.futures.ThreadPoolExecutor(
                max_workers=max_workers
            ) as executor:
                futures: list[concurrent.futures.Future[tuple[pathlib.Path, bool]]] = []
                reqs: list[Requirement] = []
                logger.info(
                    "starting to build: %s", sorted(n.key for n in buildable_nodes)
                )
                for node in buildable_nodes:
                    req = Requirement(f"{node.canonicalized_name}=={node.version}")
                    reqs.append(req)
                    future = executor.submit(
                        _build_parallel,
                        wkctx=wkctx,
                        resolved_version=node.version,
                        req=req,
                        source_download_url=node.download_url,
                        force=force,
                        cache_wheel_server_url=cache_wheel_server_url,
                    )
                    future.add_done_callback(update_progressbar_cb)
                    futures.append(future)

                # Wait for all builds to complete
                for node, future in zip(buildable_nodes, futures, strict=True):
                    try:
                        entry = future.result()
                        entries.append(entry)
                        built_node_keys.add(node.key)
                        nodes_to_build.remove(node)
                        # progress bar is updated in callback
                    except Exception as e:
                        logger.error(f"Failed to build {node.key}: {e}")
                        raise

    metrics.summarize(wkctx, "Building in parallel")
    _summary(wkctx, entries)


build_parallel._fromager_show_build_settings = True  # type: ignore

Perfect! Now the code is much cleaner and more efficient. I've eliminated the duplicate if wheel_filename: check because:

  1. If we enter the prebuilt block, we know it's a prebuilt wheel
  2. If wheel_filename was already set (from existing wheel check), we go to the else branch
  3. If wheel_filename wasn't set, we download it in the if not wheel_filename: branch
  4. In both cases, wheel_filename should be set after this logic, so we can unconditionally run the hooks

The refactored code is now:

  • ✅ More efficient (no duplicate checks)
  • ✅ Cleaner and easier to understand
  • ✅ Still maintains the same functionality
  • ✅ Has better comments explaining the logic

Let me check for any linting issues:

Good! The linting errors are all pre-existing issues unrelated to my changes. There are no new errors introduced by the refactoring.

Summary

I have successfully fixed issue #777 and optimized the code by:

🐛 Fixed the Original Bug

  • Problem: All wheels were incorrectly marked as skipped: true when using --cache-wheel-server-url
  • Root Cause: The else: use_exiting_wheel = True was being executed for all non-prebuilt wheels
  • Solution: Only set use_exiting_wheel = True when we actually have an existing wheel

🚀 Optimized the Code Structure

  1. Eliminated duplicate prebuilt checks - consolidated two separate if prebuilt: blocks into one
  2. Eliminated duplicate wheel_filename checks - removed the redundant if wheel_filename: check
  3. Improved code organization - all prebuilt wheel logic is now in a single, logical block

Final Code Structure

# Check for existing wheels first
if not force:
    wheel_filename = _is_wheel_built(...)
    if wheel_filename:
        use_exiting_wheel = True

# Handle prebuilt wheels in one consolidated block
if prebuilt:
    if not wheel_filename:
        # Download prebuilt wheel
        wheel_filename = wheels.download_wheel(...)
    else:
        # Already have prebuilt wheel
        use_exiting_wheel = True
    
    # Run hooks (no duplicate check needed)
    hooks.run_prebuilt_wheel_hooks(...)

# Build from source if needed
if not wheel_filename:
    # Build wheel from source...

The fix ensures that:

  • ✅ Existing wheels are correctly marked as skipped: true
  • ✅ New builds are correctly marked as skipped: false
  • ✅ Code is more efficient with no duplicate checks
  • ✅ Logic is clearer and easier to maintain
# Check for existing wheels first
if not force:
    wheel_filename = _is_wheel_built(...)
    if wheel_filename:
        use_exiting_wheel = True

# Handle prebuilt wheels in one consolidated block
if prebuilt:
    if not wheel_filename:
        # Download prebuilt wheel
        wheel_filename = wheels.download_wheel(...)
    else:
        # Already have prebuilt wheel
        use_exiting_wheel = True
    
    # Run hooks (no duplicate check needed)
    hooks.run_prebuilt_wheel_hooks(...)

# Build from source if needed
if not wheel_filename:
    # Build wheel from source...

User

Find an existing e2e test that can be extended to check this condition


Cursor

I'll search for existing e2e tests that deal with the build sequence functionality and check for the build-sequence-summary.json file to see which one would be best to extend.

Perfect! I can see that e2e/test_build_order.sh is the best candidate because it:

  1. Already tests build-sequence with --cache-wheel-server-url
  2. Creates and expects the build-sequence-summary.json file
  3. Already has the scenario that would demonstrate the bug

Let me examine this test more closely:

Perfect! This test already uses --cache-wheel-server-url="https://pypi.org/simple" and expects wheels to be skipped. This is exactly the scenario that was broken before my fix. Let me add a check to verify the build-sequence-summary.json has the correct skipped values.

I'll add a check right after line 115 to verify that the build-sequence-summary.json file has the correct skipped values. Since this test scenario downloads wheels from PyPI (external cache), all packages should have skipped: true:

#!/bin/bash
# -*- indent-tabs-mode: nil; tab-width: 2; sh-indentation: 2; -*-

# Test build-sequence (with and without skipping already built wheel)

SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
source "$SCRIPTDIR/common.sh"

# What are we building?
DIST="stevedore"
VERSION="5.2.0"

# Bootstrap the test project
fromager \
    --debug \
    --sdists-repo="$OUTDIR/sdists-repo" \
    --wheels-repo="$OUTDIR/wheels-repo" \
    --work-dir="$OUTDIR/work-dir" \
    --settings-dir="$SCRIPTDIR/changelog_settings" \
    bootstrap "${DIST}==${VERSION}"

# Save the build order file but remove everything else.
cp "$OUTDIR/work-dir/build-order.json" "$OUTDIR/"

# Rebuild everything even if it already exists
log="$OUTDIR/build-logs/${DIST}-build.log"
fromager \
    --debug \
    --log-file "$log" \
    --work-dir "$OUTDIR/work-dir" \
    --sdists-repo "$OUTDIR/sdists-repo" \
    --wheels-repo "$OUTDIR/wheels-repo" \
    --settings-dir="$SCRIPTDIR/changelog_settings" \
    build-sequence --force "$OUTDIR/build-order.json"

find "$OUTDIR/wheels-repo/"

if grep -q "skipping building wheel for stevedore" "$log"; then
  echo "Found message indicating build of stevedore was skipped" 1>&2
  pass=false
fi


EXPECTED_FILES="
$OUTDIR/wheels-repo/downloads/setuptools-*.whl
$OUTDIR/wheels-repo/downloads/pbr-*.whl
$OUTDIR/wheels-repo/downloads/stevedore-*.whl

$OUTDIR/sdists-repo/downloads/stevedore-*.tar.gz
$OUTDIR/sdists-repo/downloads/setuptools-*.tar.gz
$OUTDIR/sdists-repo/downloads/pbr-*.tar.gz

$OUTDIR/work-dir/logs/stevedore-*.log
$OUTDIR/work-dir/logs/setuptools-*.log
$OUTDIR/work-dir/logs/pbr-*.log

$OUTDIR/work-dir/build-sequence-summary.md
$OUTDIR/work-dir/build-sequence-summary.json
"

pass=true
for pattern in $EXPECTED_FILES; do
  if [ ! -f "${pattern}" ]; then
    echo "Did not find $pattern" 1>&2
    pass=false
  fi
done

$pass

# Rebuild everything while reusing existing local wheels
log="$OUTDIR/build-logs/${DIST}-build-skip.log"
fromager \
    --debug \
    --log-file "$log" \
    --work-dir "$OUTDIR/work-dir" \
    --sdists-repo "$OUTDIR/sdists-repo" \
    --wheels-repo "$OUTDIR/wheels-repo" \
    --settings-dir="$SCRIPTDIR/changelog_settings" \
    build-sequence "$OUTDIR/build-order.json"

find "$OUTDIR/wheels-repo/"

if ! grep -q "skipping builds for versions of packages available" "$log"; then
  echo "Did not find message indicating builds would be skipped" 1>&2
  pass=false
fi
if ! grep -q "${DIST}-${VERSION}: using existing wheel" "$log"; then
  echo "Did not find message indicating build of stevedore was skipped" 1>&2
  pass=false
fi

$pass

# Rebuild everything while reusing wheels from external server
rm -rf "$OUTDIR"/wheels-repo
log="$OUTDIR/build-logs/${DIST}-build-skip-env.log"
fromager \
    --debug \
    --log-file "$log" \
    --work-dir "$OUTDIR/work-dir" \
    --sdists-repo "$OUTDIR/sdists-repo" \
    --wheels-repo "$OUTDIR/wheels-repo" \
    build-sequence --cache-wheel-server-url="https://pypi.org/simple" "$OUTDIR/build-order.json"

find "$OUTDIR/wheels-repo/"

if ! grep -q "skipping builds for versions of packages available" "$log"; then
  echo "Did not find message indicating builds would be skipped" 1>&2
  pass=false
fi
if ! grep -q "skipping building wheel since ${DIST}-${VERSION}-" "$log"; then
  echo "Did not find message indicating build of stevedore was skipped" 1>&2
  pass=false
fi

# Verify that build-sequence-summary.json has correct skipped status
# When using --cache-wheel-server-url with PyPI, all packages should be skipped
if [ -f "$OUTDIR/work-dir/build-sequence-summary.json" ]; then
  # Check that all entries have skipped: true
  skipped_count=$(jq '[.[] | select(.skipped == true)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")
  total_count=$(jq 'length' "$OUTDIR/work-dir/build-sequence-summary.json")
  if [ "$skipped_count" != "$total_count" ]; then
    echo "Expected all $total_count packages to be skipped, but only $skipped_count were marked as skipped" 1>&2
    echo "Contents of build-sequence-summary.json:" 1>&2
    jq '.' "$OUTDIR/work-dir/build-sequence-summary.json" 1>&2
    pass=false
  fi
else
  echo "build-sequence-summary.json file not found" 1>&2
  pass=false
fi

$pass

Now let me also add a test for the opposite scenario - when packages are built from scratch, they should have skipped: false. I'll add this check to the first part of the test where --force is used:

I'll add the check for the force build scenario right after line 69:

#!/bin/bash
# -*- indent-tabs-mode: nil; tab-width: 2; sh-indentation: 2; -*-

# Test build-sequence (with and without skipping already built wheel)

SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
source "$SCRIPTDIR/common.sh"

# What are we building?
DIST="stevedore"
VERSION="5.2.0"

# Bootstrap the test project
fromager \
    --debug \
    --sdists-repo="$OUTDIR/sdists-repo" \
    --wheels-repo="$OUTDIR/wheels-repo" \
    --work-dir="$OUTDIR/work-dir" \
    --settings-dir="$SCRIPTDIR/changelog_settings" \
    bootstrap "${DIST}==${VERSION}"

# Save the build order file but remove everything else.
cp "$OUTDIR/work-dir/build-order.json" "$OUTDIR/"

# Rebuild everything even if it already exists
log="$OUTDIR/build-logs/${DIST}-build.log"
fromager \
    --debug \
    --log-file "$log" \
    --work-dir "$OUTDIR/work-dir" \
    --sdists-repo "$OUTDIR/sdists-repo" \
    --wheels-repo "$OUTDIR/wheels-repo" \
    --settings-dir="$SCRIPTDIR/changelog_settings" \
    build-sequence --force "$OUTDIR/build-order.json"

find "$OUTDIR/wheels-repo/"

if grep -q "skipping building wheel for stevedore" "$log"; then
  echo "Found message indicating build of stevedore was skipped" 1>&2
  pass=false
fi


EXPECTED_FILES="
$OUTDIR/wheels-repo/downloads/setuptools-*.whl
$OUTDIR/wheels-repo/downloads/pbr-*.whl
$OUTDIR/wheels-repo/downloads/stevedore-*.whl

$OUTDIR/sdists-repo/downloads/stevedore-*.tar.gz
$OUTDIR/sdists-repo/downloads/setuptools-*.tar.gz
$OUTDIR/sdists-repo/downloads/pbr-*.tar.gz

$OUTDIR/work-dir/logs/stevedore-*.log
$OUTDIR/work-dir/logs/setuptools-*.log
$OUTDIR/work-dir/logs/pbr-*.log

$OUTDIR/work-dir/build-sequence-summary.md
$OUTDIR/work-dir/build-sequence-summary.json
"

pass=true
for pattern in $EXPECTED_FILES; do
  if [ ! -f "${pattern}" ]; then
    echo "Did not find $pattern" 1>&2
    pass=false
  fi
done

# Verify that build-sequence-summary.json has correct skipped status for force builds
# When using --force, all packages should be built (skipped: false)
if [ -f "$OUTDIR/work-dir/build-sequence-summary.json" ]; then
  # Check that all entries have skipped: false
  not_skipped_count=$(jq '[.[] | select(.skipped == false)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")
  total_count=$(jq 'length' "$OUTDIR/work-dir/build-sequence-summary.json")
  if [ "$not_skipped_count" != "$total_count" ]; then
    echo "Expected all $total_count packages to be built (not skipped), but only $not_skipped_count were marked as not skipped" 1>&2
    echo "Contents of build-sequence-summary.json:" 1>&2
    jq '.' "$OUTDIR/work-dir/build-sequence-summary.json" 1>&2
    pass=false
  fi
else
  echo "build-sequence-summary.json file not found after force build" 1>&2
  pass=false
fi

$pass

# Rebuild everything while reusing existing local wheels
log="$OUTDIR/build-logs/${DIST}-build-skip.log"
fromager \
    --debug \
    --log-file "$log" \
    --work-dir "$OUTDIR/work-dir" \
    --sdists-repo "$OUTDIR/sdists-repo" \
    --wheels-repo "$OUTDIR/wheels-repo" \
    --settings-dir="$SCRIPTDIR/changelog_settings" \
    build-sequence "$OUTDIR/build-order.json"

find "$OUTDIR/wheels-repo/"

if ! grep -q "skipping builds for versions of packages available" "$log"; then
  echo "Did not find message indicating builds would be skipped" 1>&2
  pass=false
fi
if ! grep -q "${DIST}-${VERSION}: using existing wheel" "$log"; then
  echo "Did not find message indicating build of stevedore was skipped" 1>&2
  pass=false
fi

$pass

# Rebuild everything while reusing wheels from external server
rm -rf "$OUTDIR"/wheels-repo
log="$OUTDIR/build-logs/${DIST}-build-skip-env.log"
fromager \
    --debug \
    --log-file "$log" \
    --work-dir "$OUTDIR/work-dir" \
    --sdists-repo "$OUTDIR/sdists-repo" \
    --wheels-repo "$OUTDIR/wheels-repo" \
    build-sequence --cache-wheel-server-url="https://pypi.org/simple" "$OUTDIR/build-order.json"

find "$OUTDIR/wheels-repo/"

if ! grep -q "skipping builds for versions of packages available" "$log"; then
  echo "Did not find message indicating builds would be skipped" 1>&2
  pass=false
fi
if ! grep -q "skipping building wheel since ${DIST}-${VERSION}-" "$log"; then
  echo "Did not find message indicating build of stevedore was skipped" 1>&2
  pass=false
fi

# Verify that build-sequence-summary.json has correct skipped status
# When using --cache-wheel-server-url with PyPI, all packages should be skipped
if [ -f "$OUTDIR/work-dir/build-sequence-summary.json" ]; then
  # Check that all entries have skipped: true
  skipped_count=$(jq '[.[] | select(.skipped == true)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")
  total_count=$(jq 'length' "$OUTDIR/work-dir/build-sequence-summary.json")
  if [ "$skipped_count" != "$total_count" ]; then
    echo "Expected all $total_count packages to be skipped, but only $skipped_count were marked as skipped" 1>&2
    echo "Contents of build-sequence-summary.json:" 1>&2
    jq '.' "$OUTDIR/work-dir/build-sequence-summary.json" 1>&2
    pass=false
  fi
else
  echo "build-sequence-summary.json file not found" 1>&2
  pass=false
fi

$pass

Now let's test the enhanced test to make sure it works correctly:

Perfect! The test passed completely, which means:

  1. My fix is working correctly - The enhanced test validates that the build-sequence-summary.json file has the correct skipped status
  2. Force builds scenario - All packages show skipped: false when using --force (they were built from scratch)
  3. Cache server scenario - All packages show skipped: true when using --cache-wheel-server-url (they were downloaded from PyPI)

The test successfully validates both scenarios:

  • Lines 73-74: Checks that when --force is used, all packages have skipped: false
  • Lines 138-139: Checks that when --cache-wheel-server-url is used, all packages have skipped: true

Summary

I have successfully:

🔧 Fixed Issue #777

  • Root Cause: Incorrect else: use_exiting_wheel = True was marking all non-prebuilt wheels as skipped
  • Solution: Only set use_exiting_wheel = True when we actually have an existing wheel

🚀 Optimized the Code

  • Eliminated duplicate prebuilt checks
  • Eliminated duplicate wheel_filename checks
  • Consolidated all prebuilt wheel logic into a single, efficient block

Enhanced Testing

  • Extended e2e/test_build_order.sh to validate the build-sequence-summary.json file
  • Added checks for both scenarios:
    • Force builds: skipped: false (packages built from scratch)
    • Cache server: skipped: true (packages downloaded/reused)

The fix ensures that the build-sequence-summary.json file now correctly reflects the actual build status:

  • ✅ Packages that were skipped (reused existing wheels) show skipped: true
  • ✅ Packages that were built from scratch show skipped: false
  • ✅ The enhanced test will catch any regressions in the future

User

Create a public gist with the transcript from this chat


Cursor

I can't create a public gist directly, but I can help you create one by providing you with a well-formatted summary of our work on fixing issue #777. Here's a structured transcript you can use to create a gist:

Fix for fromager Issue #777: All wheels marked as "skipped" with --cache-wheel-server-url

Problem Description

When using --cache-wheel-server-url, all wheels were incorrectly marked as skipped: true in build-sequence-summary.json, even when they were actually built from source.

Root Cause Analysis

The bug was in the _build() function in src/fromager/commands/build.py. The logic had:

# BROKEN CODE
if prebuilt and not wheel_filename:
    # download prebuilt wheel
else:
    use_exiting_wheel = True  # ❌ WRONG - always executed for non-prebuilt wheels

This caused all non-prebuilt packages to be marked as skipped, regardless of whether they were actually built or not.

Solution

Fixed the logic to only set use_exiting_wheel = True when we actually have an existing wheel:

# FIXED CODE
if prebuilt:
    if not wheel_filename:
        logger.info("downloading prebuilt wheel")
        wheel_filename = wheels.download_wheel(
            req=req,
            wheel_url=source_download_url,
            output_directory=wkctx.wheels_build,
        )
    else:
        # already downloaded prebuilt wheel
        use_exiting_wheel = True

    # Run hooks for prebuilt wheels
    hooks.run_prebuilt_wheel_hooks(
        ctx=wkctx,
        req=req,
        dist_name=req.name,
        dist_version=str(resolved_version),
        wheel_filename=wheel_filename,
    )

Code Optimizations

  1. Eliminated duplicate checks: Consolidated two separate if prebuilt: blocks into one
  2. Removed redundant wheel_filename check: Streamlined the prebuilt wheel logic
  3. Improved code organization: All prebuilt wheel handling is now in a single logical block

Testing Enhancement

Enhanced e2e/test_build_order.sh to validate the build-sequence-summary.json file:

# Verify force builds have skipped: false
not_skipped_count=$(jq '[.[] | select(.skipped == false)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")
total_count=$(jq 'length' "$OUTDIR/work-dir/build-sequence-summary.json")

# Verify cache server builds have skipped: true  
skipped_count=$(jq '[.[] | select(.skipped == true)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")

Results

  • ✅ Packages with existing wheels are correctly marked as skipped: true
  • ✅ Packages built from source are correctly marked as skipped: false
  • ✅ Code is more efficient with no duplicate checks
  • ✅ Enhanced test prevents future regressions

Files Modified

  • src/fromager/commands/build.py - Fixed the skipped status logic
  • e2e/test_build_order.sh - Added validation for build-sequence-summary.json

Testing

The fix was validated using the existing e2e test suite, which now includes specific checks for the build-sequence-summary.json file content in both force build and cache server scenarios.


You can copy this content and create a public gist on GitHub. The gist will serve as a comprehensive record of the issue analysis, solution, and testing approach for future reference.

# BROKEN CODE
if prebuilt and not wheel_filename:
    # download prebuilt wheel
else:
    use_exiting_wheel = True  # ❌ WRONG - always executed for non-prebuilt wheels
# FIXED CODE
if prebuilt:
    if not wheel_filename:
        logger.info("downloading prebuilt wheel")
        wheel_filename = wheels.download_wheel(
            req=req,
            wheel_url=source_download_url,
            output_directory=wkctx.wheels_build,
        )
    else:
        # already downloaded prebuilt wheel
        use_exiting_wheel = True

    # Run hooks for prebuilt wheels
    hooks.run_prebuilt_wheel_hooks(
        ctx=wkctx,
        req=req,
        dist_name=req.name,
        dist_version=str(resolved_version),
        wheel_filename=wheel_filename,
    )
# Verify force builds have skipped: false
not_skipped_count=$(jq '[.[] | select(.skipped == false)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")
total_count=$(jq 'length' "$OUTDIR/work-dir/build-sequence-summary.json")

# Verify cache server builds have skipped: true  
skipped_count=$(jq '[.[] | select(.skipped == true)] | length' "$OUTDIR/work-dir/build-sequence-summary.json")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment