Switching to Byte-Based Strides in LFortran's Array Descriptor

Background

LFortran's internal array descriptor is designed to be compatible with CFI_cdesc_t (the ISO_Fortran_binding descriptor), but it currently stores strides in element units rather than byte units. This mismatch requires conversion at every bind(c) boundary.

This document summarizes why switching to byte-based strides is the right move.

How Other Compilers Handle Strides

Compiler	Internal stride	Consequence
Flang	Byte-based	Descriptor is CFI_cdesc_t — zero conversion at bind(c) boundaries
GFortran	Element-based	Requires conversion; developers have called it a "kludge" (GCC Bug 37577) but can't change due to ABI stability
LFortran	Element-based	Requires copy + convert at every bind(c) call; not constrained by ABI stability

Flang made the pragmatic choice of using byte-based strides everywhere, so its internal descriptor is a superset of CFI_cdesc_t with no conversion layer. GFortran carries the legacy of element-based strides and can't change without breaking ABI. LFortran has no such constraint.

Current Cost of Element-Based Strides

The stride mismatch forces substantial conversion machinery in asr_to_llvm.cpp:

On entry to a bind(c) function (~line 7140): incoming CFI byte strides are divided by element size to produce internal element strides. For allocatable intent(in), a full descriptor copy is made first.
On exit from a bind(c) function (~line 7262): element strides are multiplied back to byte strides via convert_bindc_strides_to_byte(), tracked through BindCStrideExit bookkeeping structs.
When calling a bind(c) function (~line 18956): the descriptor is copied to a temporary, strides are multiplied by element size, and CFI metadata fields are explicitly set via set_cfi_descriptor_fields().
Special cases construct entirely new rank-0 CFI descriptors for scalars, character allocatables, and scalar allocatable/pointer arguments (~lines 18820–19177).
Post-call fixup (~line 20654) reads back modified base_addr from CFI descriptors for allocatable arguments.

This is real runtime overhead — loops over dimensions, memory copies, and divide/multiply operations — at every interop boundary.

No Performance Cost for Normal Operations

Indexing (address calculation)

The total arithmetic work is identical, just distributed differently:

Element-based: addr = base + (Σ (i_k - lb_k) × stride_k) × elem_size — N multiplies by stride, plus one final multiply by elem_size.
Byte-based: addr = base + Σ (i_k - lb_k) × stride_k — N multiplies by stride (elem_size is already baked into each stride).

Same number of multiplications. For contiguous arrays where stride₁ = 1, element-based lets you skip one multiply, but byte-based stride₁ = elem_size which is a compile-time constant that LLVM folds anyway.

ubound, lbound, size, extent

These read lower_bound and extent directly from the descriptor. Strides are not involved. No change.

LLVM IR

With opaque pointers (modern LLVM), byte-offset arithmetic (ptr_as_i8 + byte_offset) optimizes identically to element-indexed GEP. For contiguous arrays, LFortran's "array to data" pass already eliminates descriptors entirely, so the stride representation is irrelevant in the hot path.

Summary

Net effect: eliminates conversion overhead at bind(c) boundaries with no cost to normal array operations.
Flang precedent: already uses byte-based strides successfully.
No ABI constraint: unlike GFortran, LFortran can make this change freely.
Simplifies codegen: removes ~500 lines of stride conversion, descriptor copying, and bookkeeping in asr_to_llvm.cpp.

certik/byte_strides.md

Select an option

No results found