LFortran's internal array descriptor is designed to be compatible with
CFI_cdesc_t (the ISO_Fortran_binding descriptor), but it currently stores
strides in element units rather than byte units. This mismatch requires
conversion at every bind(c) boundary.
This document summarizes why switching to byte-based strides is the right move.
| Compiler | Internal stride | Consequence |
|---|---|---|
| Flang | Byte-based | Descriptor is CFI_cdesc_t — zero conversion at bind(c) boundaries |
| GFortran | Element-based | Requires conversion; developers have called it a "kludge" (GCC Bug 37577) but can't change due to ABI stability |
| LFortran | Element-based | Requires copy + convert at every bind(c) call; not constrained by ABI stability |
Flang made the pragmatic choice of using byte-based strides everywhere, so its
internal descriptor is a superset of CFI_cdesc_t with no conversion layer.
GFortran carries the legacy of element-based strides and can't change without
breaking ABI. LFortran has no such constraint.
The stride mismatch forces substantial conversion machinery in
asr_to_llvm.cpp:
-
On entry to a bind(c) function (~line 7140): incoming CFI byte strides are divided by element size to produce internal element strides. For allocatable
intent(in), a full descriptor copy is made first. -
On exit from a bind(c) function (~line 7262): element strides are multiplied back to byte strides via
convert_bindc_strides_to_byte(), tracked throughBindCStrideExitbookkeeping structs. -
When calling a bind(c) function (~line 18956): the descriptor is copied to a temporary, strides are multiplied by element size, and CFI metadata fields are explicitly set via
set_cfi_descriptor_fields(). -
Special cases construct entirely new rank-0 CFI descriptors for scalars, character allocatables, and scalar allocatable/pointer arguments (~lines 18820–19177).
-
Post-call fixup (~line 20654) reads back modified
base_addrfrom CFI descriptors for allocatable arguments.
This is real runtime overhead — loops over dimensions, memory copies, and divide/multiply operations — at every interop boundary.
The total arithmetic work is identical, just distributed differently:
- Element-based:
addr = base + (Σ (i_k - lb_k) × stride_k) × elem_size— N multiplies by stride, plus one final multiply by elem_size. - Byte-based:
addr = base + Σ (i_k - lb_k) × stride_k— N multiplies by stride (elem_size is already baked into each stride).
Same number of multiplications. For contiguous arrays where stride₁ = 1, element-based lets you skip one multiply, but byte-based stride₁ = elem_size which is a compile-time constant that LLVM folds anyway.
These read lower_bound and extent directly from the descriptor. Strides are
not involved. No change.
With opaque pointers (modern LLVM), byte-offset arithmetic (ptr_as_i8 + byte_offset) optimizes identically to element-indexed GEP. For contiguous
arrays, LFortran's "array to data" pass already eliminates descriptors entirely,
so the stride representation is irrelevant in the hot path.
- Net effect: eliminates conversion overhead at bind(c) boundaries with no cost to normal array operations.
- Flang precedent: already uses byte-based strides successfully.
- No ABI constraint: unlike GFortran, LFortran can make this change freely.
- Simplifies codegen: removes ~500 lines of stride conversion, descriptor
copying, and bookkeeping in
asr_to_llvm.cpp.