Skip to content

Instantly share code, notes, and snippets.

@zwaldowski
Last active March 15, 2020 16:58
Show Gist options
  • Save zwaldowski/c1c93097d24d3024e8e4d940ff7f75d6 to your computer and use it in GitHub Desktop.
Save zwaldowski/c1c93097d24d3024e8e4d940ff7f75d6 to your computer and use it in GitHub Desktop.

Collection Splitting By Length

Introduction

The Standard Library offers methods for splitting a collection based on some separator. A separator may not always be applicable. Other standard libraries tend to offer a length-based alternative, whether that is some constant size (n) or to split into every element (n = 1).

Swift-Evolution threads:

Motivation

Swift is growing more and better support for processing collections in ways that take advantage of its unique indexing model, but many use cases still require a drop down to manual indexing.

One such use case common in other standard libraries is splitting a sequence into subsequences based on length, as opposed to the separator-based split already in the Standard Library.

Length-based splits are useful a few kinds of list comprehension, such as parsing text in a known format (like MAC addresses) or batch processing ("upload the photos two- at-a-time").

Proposed solution

Add a pair of methods, Sequence.split(every:) and Collection.split(every:).

This closely mirrors split(separator:maxSplits:omittingEmptySubsequences:) and split(maxSplits:omittingEmptySubsequences:whereSeparator:), with the express purpose of matching their mental model.

Detailed design

extension Sequence {

    /// Returns the longest possible subsequences of the sequence, in order,
    /// of a given length.
    ///
    /// The resulting array consists of subsequences of at most `maxLength`
    /// elements.
    ///
    /// Use this method to process a sequence in batches or chunks.
    ///
    ///     let formattedMAC = "0123456789ABCDEF"
    ///         .split(every: 2)
    ///         .joined(separator: ":")
    ///         .uppercased()
    ///     print(formattedMAC)
    ///     // Prints "01:23:45:67:89:AB:CD:EF"
    ///
    /// - Parameters:
    ///   - maxLength: The maximum number of elements in each subsequence.
    ///     If the end of the sequence has additional elements fewer than
    ///     `maxLength`, the last subsequence includes just those elements.
    /// - Returns: An array of subsequences, split from this sequences's
    ///   elements.
    /// - Complexity: O(*n*), where *n* is the length of the sequence.
    /// - Precondition: `maxLength` must be greater than zero.
    public func split(every maxLength: Int) -> [ArraySlice<Element>]

}

extension Collection {

    /// Returns the longest possible subsequences of the collection, in order,
    /// of a given length.
    ///
    /// The resulting array consists of subsequences of at most `maxLength`
    /// elements.
    ///
    /// Use this method to process a collection in batches or chunks.
    ///
    ///     let formattedMAC = "0123456789ABCDEF"
    ///         .split(every: 2)
    ///         .joined(separator: ":")
    ///         .uppercased()
    ///     print(formattedMAC)
    ///     // Prints "01:23:45:67:89:AB:CD:EF"
    ///
    /// - Parameters:
    ///   - maxLength: The maximum number of elements in each subsequence.
    ///     If the end of the collection has additional elements fewer than
    ///     `maxLength`, the last subsequence includes just those elements.
    /// - Returns: An array of subsequences, split from this collection's
    ///   elements.
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    /// - Precondition: `maxLength` must be greater than zero.
    public func split(every maxLength: Int) -> [SubSequence]

}

The implementation trivially composes on prefix(_:) and suffix(from:).

Like Sequence.split(separator:maxSplits:omittingEmptySubsequences:) and Sequence.split(maxSplits:omittingEmptySubsequences:whereSeparator:), Sequence.split(every:) consumes the sequence.

Source compatibility

The new methods should not introduce ambiguity with the rest of the split family because they all require at least one non-defaulted, distinct argument (every:, separator:, whereSeparator:).

The new methods may cause ambiguity with identical, custom versions implemented by users.

Effect on ABI stability

This change is additive, with no effect on ABI stability.

Effect on API resilience

This change is additive, with no effect to the resilience of current API.

If, in particular, we wanted to add parameters later, they would have to be deprecated with new methods instead.

The draft implementation specifies @inlinable methods, as it trivially composes existing API. This discourages, but does not prevent, modifying the implementation. It also ties its fate to those other API-public members.

The authors hope to mitigate these limitations by making use of the Standard Library Preview Package.

Alternatives considered

Other spellings

  • split with other external labels
  • chunks or chunked
  • stride

The alignment with the existing split methods in the Standard Library is intentional and desired.

split(maxLength:) was proposed to more closely match split(maxSplits:), but the closeness was perceived as a possible point of confusion rather than a source of verisimilitude.

every was chosen in service of fluent usage, like prefix(from:). "Split every 3" reads clearly in code and in conversation, just as "Suffix from 7" does.

"Chunking" and "chunks" are more colloquial terms that come up more often in terms of batch processing. Other languages use "slicing", "slices", "splitting", and "splits" more often, as Swift already does.

stride is evoked by the Range family, which related to but distinct from Collection.

Default for maxLength

It could be argued that every: 1 is a sensible default parameter, but that does not have precedent in other languages. It's not intended as the most common use case. Being explicit here is seen as preferable.

Lazy by default

Other pitches take this approach.

Eager splitting follows the default behavior of the rest of the standard library, particularly in matching with naming convention of other collection methods.

An eager split interoperates with slicing, lazy collections, and the slicing thereof. As subsequences are already the facility the Standard Library offers for cheap subviews of collections, the performance difference of a fully lazy split is likely small.

This proposal does not preclude a lazy implementation. In a future proposal, it could be namespaced on lazy like the other lazy operators in the Standard Library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment