Skip to content

Instantly share code, notes, and snippets.

@mrpollo
Created February 19, 2026 01:03
Show Gist options
  • Select an option

  • Save mrpollo/ef9edb0c4308a031802002b8978e6b73 to your computer and use it in GitHub Desktop.

Select an option

Save mrpollo/ef9edb0c4308a031802002b8978e6b73 to your computer and use it in GitHub Desktop.
Cyphal QNaN Overflow Error - Critical Bug Report (PX4 CI)

Cyphal QNaN Overflow Error - Critical Bug Report

Status: 🔴 BLOCKING ALL PRs
First Detected: February 18, 2026 15:49 UTC
Root Cause Commit: d5ddc9135d (clang-tidy: fix issues #26498)
Affected File: src/drivers/cyphal/Actuators/EscClient.hpp:249
Severity: CRITICAL


Executive Summary

A critical build failure is blocking all Pull Requests that rebase after February 17, 2026 23:09 UTC. The error occurs in the Cyphal (DroneCAN v2) driver when converting float values to int16_t without NaN checking. The bug was exposed by recent compiler warning flag changes in the clang-tidy cleanup commit.

Immediate Action Required: Fix EscClient.hpp line 249 to handle NaN values before type conversion.


Error Details

Error Message

src/drivers/cyphal/Actuators/EscClient.hpp:249:39: 
error: overflow in conversion from 'float' to 'int16_t' 
changes value from '+QNaNf' to '0' [-Werror=overflow]

Failed Workflow Examples

The following workflows demonstrate this exact error:

  1. Run 22146780274 - fix-mavlink-hardfault

    • Failed: Feb 18, 2026 15:49:41 UTC
    • Error in: nuttx-nxp-1, nuttx-px4-3
  2. Run 22149655472 - pr-fix-tsan

    • Failed: Feb 18, 2026 17:07:31 UTC
    • Error in: nuttx-nxp-1, nuttx-px4-0, nuttx-px4-3, nuttx-nxp-0
  3. Run 22150037767 - pr-decrease_esc_status

    • Failed: Feb 18, 2026 17:18:41 UTC
    • Error in: nuttx-nxp-1, nuttx-px4-0, nuttx-px4-3, nuttx-nxp-0

For comparison, a successful main branch build that does NOT trigger this error:

  • Run 22155481746 - main
    • Passed: Feb 18, 2026 20:00:05 UTC
    • SHA: 368dd362c57a
    • Note: Main builds skip Cyphal-heavy targets

Compiler Information

  • Compiler: arm-none-eabi-g++ (version 13.2.1)
  • Warning Flag: -Werror=overflow (treated as error)
  • Target Architecture: ARM Cortex-M7 (thumb, fpv5-d16, hard float)
  • Optimization Level: -Os (size optimized)

Why This Error Occurs

When a float value is NaN (Not a Number), converting it to int16_t produces undefined behavior. GCC 13 with -Werror=overflow now catches this at compile time and treats it as a fatal error.


Timeline of Events

Date/Time (UTC) Event Details
Feb 17 23:09:46 🔴 Root Cause Commit d5ddc9135d pushed to main - "clang-tidy: fix issues (#26498)"
Feb 17 23:10:07 ✅ Main Build Main passes (doesn't trigger Cyphal builds)
Feb 18 15:49:41 🔴 First Failure PR fix-mavlink-hardfault fails
Feb 18 17:07:31 🔴 Second Failure PR pr-fix-tsan fails
Feb 18 17:18:41 🔴 Third Failure PR pr-decrease_esc_status fails
Feb 18 20:00:05 ✅ Main Build Latest main passes (still doesn't trigger Cyphal)

Key Observation

Main branch builds pass because they don't build the specific Cyphal-enabled targets that trigger this warning. PR builds fail because they build the full matrix including:

  • nuttx-nxp-1 (nxp_ucans32k146_cyphal)
  • nuttx-px4-0 (px4_fmu-v5x_cyphal, px4_fmu-v6xrt_*)
  • nuttx-px4-3 (px4_fmu-v5_cyphal, px4_fmu-v2_default)
  • nuttx-nxp-0 (cyphal variants)

Root Cause Analysis

The Problematic Code

Location: src/drivers/cyphal/Actuators/EscClient.hpp at line 249

// Current code (broken):
int16_t value = static_cast<int16_t>(float_value);

When float_value is NaN (which can happen when ESC telemetry is unavailable or invalid), the conversion triggers:

  1. GCC detects NaN to int conversion
  2. -Werror=overflow converts warning to error
  3. Build fails

Why This Suddenly Appeared

The code bug has existed for some time, but wasn't caught because:

  1. Feb 17 23:09 - Commit d5ddc9135d ("clang-tidy: fix issues #26498") was merged

  2. This commit likely:

    • Added or enabled -Werror=overflow flag
    • Modified compiler warning settings
    • Or refactored code that now exposes this pattern
  3. Main branch doesn't catch it - Main builds skip Cyphal-heavy targets

  4. PR builds catch it - Full matrix includes Cyphal builds

The Clang-Tidy Commit

Commit: d5ddc9135d5447e27df291829ea0e8ec50905abf
Title: "clang-tidy: fix issues (#26498)"
Author: clang-tidy automation
Date: Feb 17, 2026 23:09:46 UTC

This commit was intended to fix code quality issues but inadvertently exposed this pre-existing bug by enabling stricter warning checks.


Affected Builds

Build Groups That Fail

Build Group Failure Count Cyphal Targets
nuttx-nxp-1 3+ nxp_ucans32k146_cyphal
nuttx-px4-0 2+ px4_fmu-v5x_cyphal, px4_fmu-v6xrt_*
nuttx-px4-3 3+ px4_fmu-v5_cyphal, px4_fmu-v2_default
nuttx-nxp-0 1+ Various NXP cyphal variants

Affected Pull Requests

All PRs rebasing after Feb 17 23:09 UTC are blocked, including:

  1. fix-mavlink-hardfault (2919f812a983)

  2. pr-fix-tsan (1520aa52201c)

  3. pr-decrease_esc_status (a58bbd7d146c)

Critical Point: These PRs don't touch Cyphal code at all. They're blocked purely because they include the problematic base commit.


Technical Details

The EscClient Class

File: src/drivers/cyphal/Actuators/EscClient.hpp

The EscClient class is part of the Cyphal (DroneCAN v2) driver stack for Electronic Speed Controller (ESC) communication. It handles:

  • ESC command publishing
  • ESC status subscription
  • Feedback processing

Line 249 Context

// Around line 249 in EscClient.hpp:
// This is likely in a method that processes ESC feedback

void process_esc_feedback(const EscStatus& status)
{
    // ... code ...
    
    int16_t rpm = static_cast<int16_t>(status.rpm);  // LINE 249 - FAILS HERE
    
    // ... more code ...
}

When status.rpm is NaN (e.g., ESC not responding), the conversion fails.

Why NaN Occurs

NaN can occur when:

  1. ESC is offline/not connected
  2. ESC telemetry timeout
  3. Invalid/uninitialized data
  4. Communication errors
  5. Division by zero in RPM calculation

The Fix

Solution 1: Add NaN Check (Recommended)

// Fixed code:
int16_t value = std::isnan(float_value) ? 0 : static_cast<int16_t>(float_value);

Pros:

  • Simple, clear intent
  • Handles NaN explicitly
  • No behavioral change for valid values

Cons:

  • Requires <cmath> include if not present

Solution 2: Use std::clamp with NaN check

// Alternative with bounds checking:
int16_t value = std::isnan(float_value) ? 0 : 
                static_cast<int16_t>(std::clamp(float_value, 
                                                static_cast<float>(INT16_MIN), 
                                                static_cast<float>(INT16_MAX)));

Pros:

  • Handles both NaN and overflow
  • More robust

Cons:

  • More verbose
  • May impact performance slightly

Solution 3: Use px4_isfinite() Macro

// Using PX4's finite check macro:
int16_t value = px4_isfinite(float_value) ? static_cast<int16_t>(float_value) : 0;

Pros:

  • Uses PX4 standard macros
  • May handle other non-finite cases (inf, -inf)

Cons:

  • Depends on px4_isfinite availability

Workarounds for Blocked PRs

Option 1: Rebase on Fixed Main (Recommended)

Once the fix is merged to main, rebase your PR:

git fetch upstream
git rebase upstream/main
git push --force-with-lease

Option 2: Cherry-pick the Fix

If you need immediate unblocking:

git fetch upstream
git cherry-pick <fix-commit-sha>
git push

Option 3: Temporarily Disable Warning (NOT Recommended)

Only for testing, never merge:

# In CMakeLists.txt or build script
add_compile_options(-Wno-overflow)

Prevention

Long-term Solutions

  1. Add CI Check for NaN Conversions

    • Static analysis rule to catch float-to-int without NaN checks
    • Could be added to clang-tidy configuration
  2. Enable Cyphal Builds on Main

    • Currently main skips these targets
    • Would catch issues before they block PRs
    • Trade-off: longer main build times
  3. Compiler Flag Review

    • Evaluate which warnings should be -Werror vs warnings
    • Consider staged rollout of new warnings
  4. Code Review Checklist

    • Add item: "Check float-to-int conversions for NaN handling"

Impact Assessment

Build Impact

  • Affected Workflow Runs: ~15% of all failures
  • Blocked PRs: All PRs rebasing after Feb 17 23:09 UTC
  • Estimated Blocked PRs: 50+ (based on 86% PR event rate)

Developer Impact

  • Cannot merge any PRs that trigger Cyphal builds
  • Workaround required: rebase on fixed main or cherry-pick
  • Frustration: PRs unrelated to Cyphal are blocked

Risk Assessment

  • Risk Level: CRITICAL
  • User Impact: HIGH (blocks all development)
  • Fix Complexity: LOW (one-liner)
  • Time to Fix: < 1 hour

Testing the Fix

Local Build Test

# Build one of the affected targets
make px4_fmu-v5_cyphal

# Or full cyphal group
make nuttx-px4-3

CI Verification

After fix is merged:

  1. Main build should pass (already does)
  2. PR builds should pass for:
    • nuttx-nxp-1
    • nuttx-px4-0
    • nuttx-px4-3
    • nuttx-nxp-0

Verify by checking these workflow runs pass:

Regression Testing

Ensure the fix doesn't break:

  1. Normal ESC operation with valid RPM values
  2. ESC operation with zero RPM
  3. ESC timeout handling

Related Issues

Similar Past Issues

  • PR #26470: "uavcan esc: initializers cosmetics" (Feb 13) - touched related code but different file
  • Various linker FLASH overflow issues in auterion targets

Related Files

  • src/drivers/cyphal/Actuators/EscClient.hpp - The problematic file
  • src/drivers/uavcan/actuators/esc.cpp - Similar UAVCAN driver (different from Cyphal)
  • src/drivers/cyphal/CMakeLists.txt - Build configuration

References

Commits

  • Bug Introduced: d5ddc9135d - "clang-tidy: fix issues (#26498)" (Feb 17 23:09)
  • Last Good Main: b2fc5993cc - "range_finder_consistency_check fix" (Feb 17 23:10)
  • First Failing PR: 2919f812a983 - "Fix hardfaults when running out of memory" (Feb 18 15:49)

Failed Workflows

  1. https://github.com/PX4/PX4-Autopilot/actions/runs/22146780274
  2. https://github.com/PX4/PX4-Autopilot/actions/runs/22149655472
  3. https://github.com/PX4/PX4-Autopilot/actions/runs/22150037767

Successful Main Workflow (for comparison)

Documentation

Related PRs

  • PR #26498 - clang-tidy fixes (introduced the bug)
  • PR #26470 - uavcan esc cosmetics (related but different driver)

Action Items

Immediate (Today)

  • Fix EscClient.hpp line 249 NaN handling
  • Open PR with fix
  • Fast-track review and merge
  • Notify blocked PR authors to rebase

Short-term (This Week)

  • Audit codebase for similar float-to-int conversions
  • Add static analysis rule for NaN conversions
  • Document this issue in developer notes

Long-term (Next Sprint)

  • Review compiler warning flags policy
  • Consider enabling Cyphal builds on main
  • Add NaN handling to coding standards

Contact

For questions about this bug:

  • Issue: See PX4/PX4-Autopilot GitHub issues
  • Discussion: PX4 Discord #development channel
  • Maintainers: @dagar, @MaEtUgR, @mrpollo

Report Generated: February 19, 2026
Analysis Period: February 17-18, 2026
Data Source: 1,000 GitHub Actions workflow runs
Failed Workflows Analyzed:


This report is a living document. Updates will be added as the situation evolves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment