- Author: Michel Fortin
This proposal introduces the concept of pure functions to Swift. A pure function is guarantied to allow only value semantics, even when dealing with classes.
Because pure functions are not allowed to mutate the global state, you don't have to worry about far reaching side effects. Pure functions are also automatically thread safe in a way that is provable by the compiler. And pure functions and properties in a class give you copy-on-write semantics almost automatically.
This document is a work in progress. There is no implementation at this time.
Understanding code is how we avoid bugs. Code that is complex becomes even more complex when you consider that each function has an unlimited number of variables it can affect or depend on in the global scope. When code is complex, mistakes happens and bugs ensues, with varying consequences.
Thread safety with shared data is hard to prove and error prone. Beside the unlimited supply of global variables each function could mistakenly touch, data passed in arguments to functions running in background queues often remains accessible to the queue that requested the function to be called in the first place, leaving open the possibility of a data race.
One way to limit data races without necessarily copying huge chunks of data when passing it to another queue is to use copy on write. Copy on write implements value semantics without the need to copy the data every time. A copy is created only when mutated and only if there are other references to it (otherwise the copy is unnecessary). Standard library types like Array and String use copy-on-write under the hood. But copy on write is unintuitive to implement and unchecked by the compiler. An error in the implementation will break value semantics and could result in data races and memory corruption if used concurrently.
Whether the data is in a global variable or behind a class reference, sharing data is potentially risky. But even if you decide to deep copy the data structure before passing it to a function, the compiler can't check that every part was deeply copied either.
Pure functions functions guaranty provable value semantics for whatever they deal with. They do not affect the global state of the program, cannot affect other parts of the program through class references, and have reproducible outputs given the same inputs. Pure functions must follow certain rules, all of them checked by the compiler.
A function, property, or subscript is made pure by prefixing it with the pure
attribute:
struct S {
pure func foo() {}
pure var number: Int
pure subscript(index: Int) -> Int { return 1 }
}
When a function has the pure
attribute, the compiler enforces that the implementation of the function follows the rules below. As a shortcut, you can declare a type pure
to make all of its members pure:
pure struct S {
func foo() {}
var number: Int
subscript(index: Int) -> Int { return 1 }
}
Doing this will not prevent extensions from attaching extra non-pure methods.
(The rules for everything but class.)
The following rules make sure that pure functions have no access to the global mutable state (global variables).
A pure function or initializer cannot access variables not passed as a parameter (note that self
an implicit parameter). Also, it can only call other functions, getters, setters for properties and subscripts, and initializers if they are themselves pure.
var globalVar = 1
pure func increment(_ param: Int) -> Int {
var localVar = param
localVar += 1 // this is fine, has no effect on the outside world
localVar += globalVar // error: globalVar is not pure, cannot assign from a pure function
return localVar
}
A pure function or initializer can take inout
arguments and it can throw. Those are just special ways of returning a value that do not affect the predictability of the output.
var globalVar = 1
pure func increment(_ param: inout Int) {
param += 1 // allowed
}
increment(&globalVar) // the context here is non-pure, we are allowed to mutate globalVar
A pure
function can have default arguments like a regular function, but the expression for the default value must itself follow the rules of a pure function.
The getter and setter for a pure property or subscript must be pure, following the rules for pure functions.
A pure stored property can be part of a struct, enum, local variable, or an immutable global variable (let
). Its didSet
and willSet
blocks must be pure, if present, following the rules for pure functions. pure var
is not allowed at global scope except for computed properties.
struct A {
// a pure global constant
static pure let defaultNumber: 8
// a pure stored property
pure var number: Int
// a pure computed property
pure var text: String {
get {
return String(number)
}
set {
number = Int(newValue) ?? A.defaultNumber
}
}
// didSet/willSet must be pure in a pure property
pure var name: String {
didSet {
// error: NSApplication.shared is not pure, cannot access from pure function
NSApplication.shared.mainWindow?.title = name
}
}
}
Because classes are reference types and there can be multiple references pointing to the same object, special rules apply to them. Specifically, the criteria for pure
is that writing to pure stored properties in a class requires the class to be uniquely referenced. This preserves values semantics and enables copy-on-write.
In a class, some functions functions can be cloning
. Writing to a pure stored property inside a class is only possible from a cloning
function. Property setters for pure
properties are implicitly cloning. When a function is cloning
, the self
parameter is passed as inout
, allowing the function to replace the object instance with another one when necessary.
This is similar to how mutating
works for structs.
@objc
is incompatible with cloning
. An object derived from an Objective-C base class can have cloning
members, but those members are only accessible from Swift since Objective-C does not suport passing self
as inout
.
A pure function in a class follows the same rules as other pure functions.
A pure cloning function of a class takes the class reference as inout
. This will allow checking whether the reference is unique. Calling a cloning function does not trigger this check until later when a stored property is written to.
A property or subscript can be pure if both its getter and setter follow the rules of pure functions.
A pure stored property has a pure
getter and a pure cloning
setter. Calling the setter of a pure stored property will automatically verify that the object is a known unique reference using isKnownUniqueReference
. If the reference is not unique and the object conforms to the CopyableObject
protocol, a copy of the object is assigned to self
before assigning the new value. If the object does nto conform to CopyableObject
and the reference is not unique, this is a fatal error. More on CopyableObject
later.
The setter of a pure property or subscript in a class is assumed to be cloning (taking self
as inout
) unless explicitly marked as noncloning
.
deinit
is called whenever an object reference count reaches zero. While this can happen in the middle of a pure function, tasks performed by deinit
typically can't be pure. In fact, making deinit
pure would only guarenty that it doesn't accomplish anything thanks to value semantics (the only value it can affect is self
, which is about to disappear anyway).
FIXME: So we have to tolerate that deinit
could do something to compromise the purity of the function, like writing to a global variable. But even if we tolerate this possible breach of value semantics, this could pose a problem for thread-safety (deinit
being called from the wrong thread).
Classes with pure cloning members may to conform to the CopyableObject protocol. This protocol defines an initializer to be called when there is more than one reference to the object and we need a uniquely referenced copy to write to.
protocol CopyableObject: class {
pure required init(copying object: Self)
}
A default implementation copying all the pure stored properties could be synthesized by the compiler. It can't copy non-pure stored properties however because being a pure initializer it does not have access to them. In the presence of non-pure stored properties, no implementation is synthesized.
Note that there is no requirement that this be a deep copy. References to other classes can be shared between instances because the rules above guaranty that objects referenced by stored properties in the class will either have copy-on-write behavior or not be accessible from pure functions.
Note that a non-synthesized constructor could break purity by not copying correctly the pure properties.
The Ownership Manifesto describes a Copyable
protocol which gives an object to the ability to copy its value from one variable to another. This is a different concept than the CopyableObject
protocol described here, which is a mean for the compiler to provide a copy-on-write behavior automatically.
Overriding a function or property cannot weaken the pure guaranty made in the base class, but it can strengthen it. For instance, you can override a non-pure function with a pure function. Note however that a pure function is not allowed to call its superclass non-pure implementation.
A closure is implicitly pure if it only call pure functions, getters, setters, or subscripts. You can make this explicit by prefixing the parameter list with pure
:
let closure = { pure (a: Int) -> Int in
return a + 1
}
In some cases a function will be pure in itself, but will have to call other functions passed through parameters. The purity of the function then depends on the purity of the passed arguments. For instance, here is a function taking a closure provided by the caller:
pure? func callMeBack(callback: pure? () -> Void) {
callback()
}
When the callback
parameter is pure, the callMeBack
function is pure too. The question mark after pure
means that the function is conditionally pure depending on the argument passed to it. This syntax proposed here works similarily to throws
and rethrows
.
If the function takes a generic type, it can contrain its pure
attribute to match one or more members of that generic type. In the following example, the wether altHashvalue
is pure function depends on whether T.hashValue
is pure:
pure? func altHashValue<T: Hashable>(_ a: T) -> Bool where T.hashValue: pure? {
return a.hashValue &+ 1
}
The where T.hashValue: pure?
constrain this the altHashValue
function to be pure if of T.hashValue
is a pure property. The compiler must check that whole body of code in the function follows the rules for a pure function, but makes an exception for T.hashValue
. At the call site, the funciton is deemed pure only if T.hashValue
is pure.
If instead of being conditionally pure we wanted altHashValue
to always be pure, then the constraint can express that as well. We can express that by removing the question mark from the pure attribute in front of altHashValue
, and removing the question mark in the constraint:
pure func altHashValue<T: Hashable>(_ a: T) -> Bool where T.hashValue: pure {
return a.hashValue &+ 1
}
Here the constraint tells us that T.hashValue
must absolutely be pure for this function to be called.
For cases where a function is known to be pure but the compiler can't prove that it is, the standard library provides a wrapper function for calling non-pure functions. This should be particularly useful when wrapping code from other languages:
let result = unsafePure { someExternalFunctionInC() }
Unsafe pure works with be conditional pure too:
pure? func callMeBack(callback: pure? @convention(c) () -> Void) {
unsafePure {
someExternalFunctionInC(callback)
}
}
Pure functions are automatically thread-safe since they do not have access to any shared mutable state.
It is safe to pass an object to a pure function running in another thread while continuing to use it the current thread. This is because only the pure (non-cloning) members of the object are accessible to the pure function. The current thread is free to mutate the non-pure portions of the object.
Weak references mutate automatically when the object at the other end vanishes. Stored properties containing weak references are thus not allowed to be pure, and are inaccessible to pure functions.
Since a pure function might stop referencing an object, the object might get deallocated causing weak references to that object to become nil
. A pure function is normally not allowed to have side effects, but non-pure functions can observe the effect of the deallocation. Thread-safety is still preserved.
There is another noteworthy side effect of weak references regarding value semantics. Since isKnownUniqueReference
does not take into account weak references, a weak reference to an object could be used to observe the value of pure stored properties as they change, either from another thread or between successive calls to a pure function in the same thread. This could be fixed by having isKnownUniqueReference
return false
in the presence of weak references.
Changing a non-pure function to a pure one in a future version of a library is allowed. A pure function that becomes non-pure is a breaking change however, so this should be avoided.
There is an exception however for open
members in classes: changing them from non-pure to pure is not allowed as this would break overrides defined in other modules.
To be determined.
While this feature is purely additive, it is expected that library authors will get significant pressure to make APIs pure everywhere possible so that their user base can make use of pure themselves. This might be a significant burden, especially for libraries with downstream dependencies not yet annotated for pure or libraries with code in other languages.
Given that the compiler is able to check if pure is valid for every function, we should offer a tool capable of proposing adding pure
to declarations whenever they are eligible, in other words when all their dependencies are pure. A flag causing the compiler to emit those suggestions would work perfectly. Every time a library dependency is updated for pure
, the tool could be re-run to see if more things can become pure.
Because pure is a commitment when it comes to public APIs, library authors should review more carefully the annotations suggested by the tool for anything that is public
or open
.
APIs imported from other languages can't be automatically checked for pure
eligibility by the compiler. They would need to be annotated using a mechanism similar to nullability. The user on the Swift side can use unsafePure
to wrap any external call if they are confident the function respects pure semantics.
We could redefine pure simply as being thread-safe. This would have no impact on the mechanics in this proposal, but it would make it legal to use unsafePure
to wrap any thread-safe code that does not follow the same-input-for-same-output requirement of a pure function. With this change pure
would denote provably thread-safe code and unsafePure
would be code that is assumed to be safe regardless of what the compiler says.
Regardless of what this document says about the semantics of pure, there is a possibility that this is how pure
and unsafePure
would end up being used in practice. The insentives to use pure this way are easy to see: auditing a few uses of unsafePure
is much easier than auditing a bigger body of code. The more code you have that is guarantied to be thread-safe with pure
the less you have to worry about thread-safety.
A nice side effect of not requiring value-semantics would be to allow pure
deinitializers to do something useful. Deinitalizers could be marked pure
and accomplish things like closing file handles in a guarentied thread-safe manner. Another effect of this is that properties with weak references would be allowed to be pure because the way they mutate is thread-safe.
Relaxing the definition of pure
in this way would augment the annotation burden for existing code since there is a larger set of functions that are thread-safe.
If we change this, we'll need a more representative name than pure
.
The Task-based concurrency manifesto discusses the concept of actors. Actors attempt to isolate some code to be run independently on a queue. Isolation is limited however, since actors operates in the same memory space than the rest of the program and have access to everything.
We could improve this by allowing a pure actor, an actor where every member is pure. Such an actor would benefit from compiler checked data isolation and would be suitable for computation tasks that do not require access to the outside world.
If however we break away from value semantics (as described in the previous section) it'd become possible to use a shared database, access the file system, and communicate through the network in a pure actor instead of being limited to pure computation tasks.
In theory, the rules for pure make sure that for the same input you'll always get the same output. So the compiler could allow elision of repeated calls with the same argument by reusing the result from a preceding call.
This property cannot be proven whenever unsafePure
is used. Normally, unsafePure
functions is supposed to be pure even if this is not verifiable by the compiler, but there are insentives to break it in some cases (described in the preceding two sections above). It might be more prudent to not allow this optimization to take place in general so that things works in a more predictble way when unsafePure
is used in a way that breaks the semantic guaranties pure is supposed to offer.
As a special case, open
functions and properties in classes cannot become pure in a future version of a library; this would break compatibility with existing overrides outside of the module. We could make it a tristate where the default allows maximum compatibility for future versions of the library:
open
: calling is not pure, but overrides must be pure (maximum future compatibility, less usefulness)open pure
: calling ispure
, overrides must be pureopen nonpure
: calling is not pure, overrides may or may not be pure
That would however be a major source-breaking change for libraries, as anything currently open would have to be relabeled open nonpure
for the current overrides to continue to work. Automatic migration is a possibility. Another would be to keep the current meaning of open and use another word for the maximum compatibility option:
open strict
: calling is not pure, but overrides must be pure (maximum future compatibility, less usefulness)open pure
: calling ispure
, overrides must be pureopen
: calling is not pure, overrides may or may not be pure
The utility of having strict
when it is not the default is up to debate however.
Pure could be inferred within the module for members that do not have an explicit pure
attribute set using the following rules:
- Any
let
variable is implicitlypure
. - A function is implicitly pure if it does not call any non-pure function.
- A computed property is implicitly pure if its implementation does not call any non-pure function.
- A stored property is implicitly pure if it is immutable (
let
) or is a mutable (var
) member of a struct, enum, or a local variable. - Specifically not included in (4) are global mutable stored properties and mutable stored properties in class instances.
Implicit pure is not visible to other modules. Public functions must be explicitly marked pure
as a commitment that the implementation will stay pure in future revision of the module.
Implicit pure has some drawbacks. While it might improve progressive disclosure by allowing functions to be used where they need to be pure without explicit annotations, this means that a small change in the implementation of a function can make it non-pure and cause a cascade of functions depending on the first one to become non-pure, causing hard to decipher errors far from where the change took place.
So this proposal is only proposing implicit pure applies to closures. The same rules as above could be used by a migration tool to annotate pure functions however.
Another possibility is to disallow classes with deinitializers from being used inside of pure functions. This would end up limiting us to final
classes however, since you can't prove that all derived classes of a base class will have no deinitializer.