Both PRs address NVMe device discovery challenges but for different cloud providers with fundamentally different approaches.
| Aspect | PR #396 (AWS) | PR #402 (Azure) |
|---|---|---|
| URL | cloudfoundry/bosh-agent#396 | cloudfoundry/bosh-agent#402 |
| Cloud Provider | AWS | Azure |
| Problem | Non-deterministic PCIe enumeration order for NVMe instance storage | Azure v6+ VMs use NVMe instead of SCSI, breaking disk discovery |
| Disk Type | Instance/ephemeral storage only | Data disks (ephemeral + persistent) |
On AWS Nitro-based instances, the kernel's PCIe enumeration order is non-deterministic:
/dev/nvme0n1could be the root EBS volume OR instance storage/dev/nvme1n1could be instance storage OR the root EBS volume- Order varies between boots and instance types
Challenge: Identify which NVMe devices are instance storage vs EBS volumes.
Azure v6+ VM sizes (Dv6, Dasv6, Ev6, etc.) use NVMe controllers instead of SCSI:
- Existing
scsiLunDevicePathResolverscans/sys/bus/vmbus/devices/paths that don't exist on NVMe VMs - Agent cannot resolve ephemeral or persistent disks on NVMe hardware
Challenge: Resolve a LUN number to its actual device path on NVMe hardware.
Algorithm:
- Glob all NVMe devices:
/dev/nvme*n1 - Glob EBS symlinks:
/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_* - Resolve each EBS symlink to its target device
- Subtract EBS devices from all NVMe devices = instance storage
- Validate count matches CPI expectations
Key insight: AWS automatically creates persistent symlinks for EBS volumes via udev rules. Instance storage is identified by what it's not.
Algorithm:
- Look up symlink at
<basePath>/<LUN>(e.g.,/dev/disk/azure/data/by-lun/1) - Follow symlink to real device path (e.g.,
/dev/nvme0n3) - If symlink doesn't exist, fall back to SCSI resolver
Key insight: Azure's azure-vm-utils creates stable LUN-to-device symlinks. Resolution is a direct lookup.
| Component | Purpose |
|---|---|
InstanceStorageResolver |
New interface for instance storage discovery |
awsNVMeInstanceStorageResolver |
Filters devices by checking for EBS symlinks |
autoDetectingInstanceStorageResolver |
Lazy initialization wrapper, auto-detects NVMe |
identityInstanceStorageResolver |
Pass-through for non-NVMe instances |
| Component | Purpose |
|---|---|
SymlinkLunDevicePathResolver |
Resolves LUN to device via symlink path |
FallbackDevicePathResolver |
Generic compositor (primary → secondary resolver) |
| No new interface | Extends existing DevicePathResolver |
| PR #396 | PR #402 | |
|---|---|---|
| Operation | Discovery | Resolution |
| Question | "Which devices are instance storage?" | "What device is LUN X?" |
| Input | Count of expected devices | LUN number |
| Output | List of device paths | Single device path |
PR #396 (AWS) - Symlinks identify devices to EXCLUDE:
/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0123 → /dev/nvme0n1 (EBS, exclude this)
PR #402 (Azure) - Symlinks point TO the device you want:
/dev/disk/azure/data/by-lun/1 → /dev/nvme0n3 (use this)
| PR #396 | PR #402 | |
|---|---|---|
| External tooling | None (uses existing AWS udev rules) | Requires azure-vm-utils in stemcell |
| CPI changes | Yes (bosh-aws-cpi-release#196) | No |
| Configuration | Auto-detects NVMe instances | Opt-in via LunDeviceSymlinkPath |
| Aspect | PR #396 | PR #402 |
|---|---|---|
| Dead code | Unused devicePathResolver field |
Clean |
| Thread safety | Potential race in lazy init | No lazy init |
| Reusability | AWS-specific | FallbackDevicePathResolver is generic |
| Bug fixes | None | Fixes NVMe regex for multi-digit partitions |
No. The approaches are not interchangeable because:
-
No LUN mapping on AWS instance storage: AWS instance storage doesn't have LUN identifiers. The CPI doesn't know which
/dev/nvme*device will be instance storage. -
Different identification model:
- Azure: "Here's LUN 1, find its device" (direct lookup)
- AWS: "Here are 2 instance storage devices, find them" (discovery by exclusion)
-
Different symlink purposes:
- Azure symlinks point to what you want
- AWS symlinks identify what to exclude
For PR #402's pattern to work on AWS, AWS would need symlinks like /dev/disk/aws/instance-storage/0 — which don't exist.
Both PRs are complementary and can coexist:
- PR #396 adds
InstanceStorageResolverfor AWS instance storage discovery - PR #402 adds
SymlinkLunDevicePathResolverfor Azure LUN resolution - Different code paths, different use cases, no conflicts
| PR #396 | PR #402 | |
|---|---|---|
| Core problem | Identify instance storage among NVMe devices | Resolve LUN to NVMe device path |
| Approach | Exclusion (subtract EBS from all NVMe) | Lookup (follow LUN symlink) |
| Scope | AWS Nitro instances | Azure v6+ VMs |
| Interface | New InstanceStorageResolver |
Existing DevicePathResolver |
| Reusability | AWS-specific logic | Generic fallback pattern |