- Proposal: SE-NNNN
- Authors: Zachary Waldowski
- Review Manager: TBD
- Status: Pitch
- Implementation: apple/swift-evolution-staging#NNNNN
The Standard Library offers methods for splitting a collection
based on some separator. A separator may not always be applicable.
Other standard libraries tend to offer a length-based alternative,
whether that is some constant size (n
) or to split into every
element (n = 1
).
Swift-Evolution threads:
Swift is growing more and better support for processing collections in ways that take advantage of its unique indexing model, but many use cases still require a drop down to manual indexing.
One such use case common in other standard libraries is splitting a sequence into subsequences based on length, as opposed to the separator-based split already in the Standard Library.
Length-based splits are useful a few kinds of list comprehension, such as parsing text in a known format (like MAC addresses) or batch processing ("upload the photos two- at-a-time").
Add a pair of methods, Sequence.split(every:)
and
Collection.split(every:)
.
This closely mirrors split(separator:maxSplits:omittingEmptySubsequences:)
and
split(maxSplits:omittingEmptySubsequences:whereSeparator:)
, with the express
purpose of matching their mental model.
extension Sequence {
/// Returns the longest possible subsequences of the sequence, in order,
/// of a given length.
///
/// The resulting array consists of subsequences of at most `maxLength`
/// elements.
///
/// Use this method to process a sequence in batches or chunks.
///
/// let formattedMAC = "0123456789ABCDEF"
/// .split(every: 2)
/// .joined(separator: ":")
/// .uppercased()
/// print(formattedMAC)
/// // Prints "01:23:45:67:89:AB:CD:EF"
///
/// - Parameters:
/// - maxLength: The maximum number of elements in each subsequence.
/// If the end of the sequence has additional elements fewer than
/// `maxLength`, the last subsequence includes just those elements.
/// - Returns: An array of subsequences, split from this sequences's
/// elements.
/// - Complexity: O(*n*), where *n* is the length of the sequence.
/// - Precondition: `maxLength` must be greater than zero.
public func split(every maxLength: Int) -> [ArraySlice<Element>]
}
extension Collection {
/// Returns the longest possible subsequences of the collection, in order,
/// of a given length.
///
/// The resulting array consists of subsequences of at most `maxLength`
/// elements.
///
/// Use this method to process a collection in batches or chunks.
///
/// let formattedMAC = "0123456789ABCDEF"
/// .split(every: 2)
/// .joined(separator: ":")
/// .uppercased()
/// print(formattedMAC)
/// // Prints "01:23:45:67:89:AB:CD:EF"
///
/// - Parameters:
/// - maxLength: The maximum number of elements in each subsequence.
/// If the end of the collection has additional elements fewer than
/// `maxLength`, the last subsequence includes just those elements.
/// - Returns: An array of subsequences, split from this collection's
/// elements.
/// - Complexity: O(*n*), where *n* is the length of the collection.
/// - Precondition: `maxLength` must be greater than zero.
public func split(every maxLength: Int) -> [SubSequence]
}
The implementation trivially composes on prefix(_:)
and suffix(from:)
.
Like Sequence.split(separator:maxSplits:omittingEmptySubsequences:)
and
Sequence.split(maxSplits:omittingEmptySubsequences:whereSeparator:)
,
Sequence.split(every:)
consumes the sequence.
The new methods should not introduce ambiguity with the rest of the
split
family because they all require at least one non-defaulted, distinct
argument (every:
, separator:
, whereSeparator:
).
The new methods may cause ambiguity with identical, custom versions implemented by users.
This change is additive, with no effect on ABI stability.
This change is additive, with no effect to the resilience of current API.
If, in particular, we wanted to add parameters later, they would have to be deprecated with new methods instead.
The draft implementation specifies @inlinable
methods, as it trivially
composes existing API. This discourages, but does not prevent, modifying
the implementation. It also ties its fate to those other API-public members.
The authors hope to mitigate these limitations by making use of the Standard Library Preview Package.
split
with other external labelschunks
orchunked
stride
The alignment with the existing split
methods in the Standard Library is
intentional and desired.
split(maxLength:)
was proposed to more closely match split(maxSplits:)
,
but the closeness was perceived as a possible point of confusion rather
than a source of verisimilitude.
every
was chosen in service of fluent usage, like prefix(from:)
.
"Split every 3" reads clearly in code and in conversation, just
as "Suffix from 7" does.
"Chunking" and "chunks" are more colloquial terms that come up more often in terms of batch processing. Other languages use "slicing", "slices", "splitting", and "splits" more often, as Swift already does.
stride
is evoked by the Range
family, which related to but distinct
from Collection
.
It could be argued that every: 1
is a sensible default parameter, but
that does not have precedent in other languages. It's not intended as the
most common use case. Being explicit here is seen as preferable.
Other pitches take this approach.
Eager splitting follows the default behavior of the rest of the standard library, particularly in matching with naming convention of other collection methods.
An eager split interoperates with slicing, lazy collections, and the slicing
thereof. As subsequences are already the facility the Standard Library offers
for cheap subviews of collections, the performance difference of a fully
lazy split
is likely small.
This proposal does not preclude a lazy implementation. In a future proposal,
it could be namespaced on lazy
like the other lazy operators in the
Standard Library.