Skip to content

Instantly share code, notes, and snippets.

@rkoster
Last active February 20, 2026 09:39
Show Gist options
  • Select an option

  • Save rkoster/9af50049788d9a67e53de3a4bf36d369 to your computer and use it in GitHub Desktop.

Select an option

Save rkoster/9af50049788d9a67e53de3a4bf36d369 to your computer and use it in GitHub Desktop.
Comparison of bosh-agent PR #396 (AWS NVMe Instance Storage) vs PR #402 (Azure NVMe Support)

Comparison: bosh-agent PR #396 vs PR #402

Overview

Both PRs address NVMe device discovery challenges but for different cloud providers with fundamentally different approaches.

Aspect PR #396 (AWS) PR #402 (Azure)
URL cloudfoundry/bosh-agent#396 cloudfoundry/bosh-agent#402
Cloud Provider AWS Azure
Problem Non-deterministic PCIe enumeration order for NVMe instance storage Azure v6+ VMs use NVMe instead of SCSI, breaking disk discovery
Disk Type Instance/ephemeral storage only Data disks (ephemeral + persistent)

Problem Statements

PR #396: AWS Instance Storage Discovery

On AWS Nitro-based instances, the kernel's PCIe enumeration order is non-deterministic:

  • /dev/nvme0n1 could be the root EBS volume OR instance storage
  • /dev/nvme1n1 could be instance storage OR the root EBS volume
  • Order varies between boots and instance types

Challenge: Identify which NVMe devices are instance storage vs EBS volumes.

PR #402: Azure NVMe Device Resolution

Azure v6+ VM sizes (Dv6, Dasv6, Ev6, etc.) use NVMe controllers instead of SCSI:

  • Existing scsiLunDevicePathResolver scans /sys/bus/vmbus/devices/ paths that don't exist on NVMe VMs
  • Agent cannot resolve ephemeral or persistent disks on NVMe hardware

Challenge: Resolve a LUN number to its actual device path on NVMe hardware.


Solution Approaches

PR #396: Exclusion-Based Discovery

Algorithm:

  1. Glob all NVMe devices: /dev/nvme*n1
  2. Glob EBS symlinks: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_*
  3. Resolve each EBS symlink to its target device
  4. Subtract EBS devices from all NVMe devices = instance storage
  5. Validate count matches CPI expectations

Key insight: AWS automatically creates persistent symlinks for EBS volumes via udev rules. Instance storage is identified by what it's not.

PR #402: Symlink-Based Resolution

Algorithm:

  1. Look up symlink at <basePath>/<LUN> (e.g., /dev/disk/azure/data/by-lun/1)
  2. Follow symlink to real device path (e.g., /dev/nvme0n3)
  3. If symlink doesn't exist, fall back to SCSI resolver

Key insight: Azure's azure-vm-utils creates stable LUN-to-device symlinks. Resolution is a direct lookup.


Architecture Comparison

PR #396 Components

Component Purpose
InstanceStorageResolver New interface for instance storage discovery
awsNVMeInstanceStorageResolver Filters devices by checking for EBS symlinks
autoDetectingInstanceStorageResolver Lazy initialization wrapper, auto-detects NVMe
identityInstanceStorageResolver Pass-through for non-NVMe instances

PR #402 Components

Component Purpose
SymlinkLunDevicePathResolver Resolves LUN to device via symlink path
FallbackDevicePathResolver Generic compositor (primary → secondary resolver)
No new interface Extends existing DevicePathResolver

Key Differences

1. Discovery vs Resolution

PR #396 PR #402
Operation Discovery Resolution
Question "Which devices are instance storage?" "What device is LUN X?"
Input Count of expected devices LUN number
Output List of device paths Single device path

2. Symlink Semantics

PR #396 (AWS) - Symlinks identify devices to EXCLUDE:
/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0123 → /dev/nvme0n1 (EBS, exclude this)

PR #402 (Azure) - Symlinks point TO the device you want:
/dev/disk/azure/data/by-lun/1 → /dev/nvme0n3 (use this)

3. Dependencies

PR #396 PR #402
External tooling None (uses existing AWS udev rules) Requires azure-vm-utils in stemcell
CPI changes Yes (bosh-aws-cpi-release#196) No
Configuration Auto-detects NVMe instances Opt-in via LunDeviceSymlinkPath

4. Code Quality

Aspect PR #396 PR #402
Dead code Unused devicePathResolver field Clean
Thread safety Potential race in lazy init No lazy init
Reusability AWS-specific FallbackDevicePathResolver is generic
Bug fixes None Fixes NVMe regex for multi-digit partitions

Can PR #402 Solve PR #396's Problem?

No. The approaches are not interchangeable because:

  1. No LUN mapping on AWS instance storage: AWS instance storage doesn't have LUN identifiers. The CPI doesn't know which /dev/nvme* device will be instance storage.

  2. Different identification model:

    • Azure: "Here's LUN 1, find its device" (direct lookup)
    • AWS: "Here are 2 instance storage devices, find them" (discovery by exclusion)
  3. Different symlink purposes:

    • Azure symlinks point to what you want
    • AWS symlinks identify what to exclude

For PR #402's pattern to work on AWS, AWS would need symlinks like /dev/disk/aws/instance-storage/0 — which don't exist.


Compatibility

Both PRs are complementary and can coexist:

  • PR #396 adds InstanceStorageResolver for AWS instance storage discovery
  • PR #402 adds SymlinkLunDevicePathResolver for Azure LUN resolution
  • Different code paths, different use cases, no conflicts

Summary

PR #396 PR #402
Core problem Identify instance storage among NVMe devices Resolve LUN to NVMe device path
Approach Exclusion (subtract EBS from all NVMe) Lookup (follow LUN symlink)
Scope AWS Nitro instances Azure v6+ VMs
Interface New InstanceStorageResolver Existing DevicePathResolver
Reusability AWS-specific logic Generic fallback pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment