Pure for guarantied value semantics

Author: Michel Fortin

Introduction

This proposal introduces the concept of pure functions to Swift. A pure function is guarantied to allow only value semantics, even when dealing with classes.

Because pure functions are not allowed to mutate the global state, you don't have to worry about far reaching side effects. Pure functions are also automatically thread safe in a way that is provable by the compiler. And pure functions and properties in a class give you copy-on-write semantics almost automatically.

This document is a work in progress. There is no implementation at this time.

Motivation

Understanding code is how we avoid bugs. Code that is complex becomes even more complex when you consider that each function has an unlimited number of variables it can affect or depend on in the global scope. When code is complex, mistakes happens and bugs ensues, with varying consequences.

Thread safety with shared data is hard to prove and error prone. Beside the unlimited supply of global variables each function could mistakenly touch, data passed in arguments to functions running in background queues often remains accessible to the queue that requested the function to be called in the first place, leaving open the possibility of a data race.

One way to limit data races without necessarily copying huge chunks of data when passing it to another queue is to use copy on write. Copy on write implements value semantics without the need to copy the data every time. A copy is created only when mutated and only if there are other references to it (otherwise the copy is unnecessary). Standard library types like Array and String use copy-on-write under the hood. But copy on write is unintuitive to implement and unchecked by the compiler. An error in the implementation will break value semantics and could result in data races and memory corruption if used concurrently.

Whether the data is in a global variable or behind a class reference, sharing data is potentially risky. But even if you decide to deep copy the data structure before passing it to a function, the compiler can't check that every part was deeply copied either.

Proposed Solution

Pure functions functions guaranty provable value semantics for whatever they deal with. They do not affect the global state of the program, cannot affect other parts of the program through class references, and have reproducible outputs given the same inputs. Pure functions must follow certain rules, all of them checked by the compiler.

A function, property, or subscript is made pure by prefixing it with the pure attribute:

struct S {
	pure func foo() {}
	pure var number: Int
	pure subscript(index: Int) -> Int { return 1 }
}

When a function has the pure attribute, the compiler enforces that the implementation of the function follows the rules below. As a shortcut, you can declare a type pure to make all of its members pure:

pure struct S {
	func foo() {}
	var number: Int
	subscript(index: Int) -> Int { return 1 }
}

Doing this will not prevent extensions from attaching extra non-pure methods.

Globals, locals, struct, and enum

(The rules for everything but class.)

The following rules make sure that pure functions have no access to the global mutable state (global variables).

Functions & initializers

A pure function or initializer cannot access variables not passed as a parameter (note that self an implicit parameter). Also, it can only call other functions, getters, setters for properties and subscripts, and initializers if they are themselves pure.

var globalVar = 1

pure func increment(_ param: Int) -> Int {
	var localVar = param
	localVar += 1 // this is fine, has no effect on the outside world
	localVar += globalVar // error: globalVar is not pure, cannot assign from a pure function
	return localVar
}

A pure function or initializer can take inout arguments and it can throw. Those are just special ways of returning a value that do not affect the predictability of the output.

var globalVar = 1

pure func increment(_ param: inout Int) {
	param += 1 // allowed
}

increment(&globalVar) // the context here is non-pure, we are allowed to mutate globalVar

Default Arguments

A pure function can have default arguments like a regular function, but the expression for the default value must itself follow the rules of a pure function.

Properties & subscripts

The getter and setter for a pure property or subscript must be pure, following the rules for pure functions.

A pure stored property can be part of a struct, enum, local variable, or an immutable global variable (let). Its didSet and willSet blocks must be pure, if present, following the rules for pure functions. pure var is not allowed at global scope except for computed properties.

struct A {
	// a pure global constant
	static pure let defaultNumber: 8

	// a pure stored property
	pure var number: Int

	// a pure computed property
	pure var text: String {
		get {
			return String(number)
		}
		set {
			number = Int(newValue) ?? A.defaultNumber
		}
	}

	// didSet/willSet must be pure in a pure property
	pure var name: String {
		didSet {
			// error: NSApplication.shared is not pure, cannot access from pure function
			NSApplication.shared.mainWindow?.title = name
		}
	}
}

Class

Because classes are reference types and there can be multiple references pointing to the same object, special rules apply to them. Specifically, the criteria for pure is that writing to pure stored properties in a class requires the class to be uniquely referenced. This preserves values semantics and enables copy-on-write.

Copy on write

In a class, some functions functions can be cloning. Writing to a pure stored property inside a class is only possible from a cloning function. Property setters for pure properties are implicitly cloning. When a function is cloning, the self parameter is passed as inout, allowing the function to replace the object instance with another one when necessary.

This is similar to how mutating works for structs.

@objc is incompatible with cloning. An object derived from an Objective-C base class can have cloning members, but those members are only accessible from Swift since Objective-C does not suport passing self as inout.

Functions

A pure function in a class follows the same rules as other pure functions.

A pure cloning function of a class takes the class reference as inout. This will allow checking whether the reference is unique. Calling a cloning function does not trigger this check until later when a stored property is written to.

Properties & subscripts

A property or subscript can be pure if both its getter and setter follow the rules of pure functions.

A pure stored property has a pure getter and a pure cloning setter. Calling the setter of a pure stored property will automatically verify that the object is a known unique reference using isKnownUniqueReference. If the reference is not unique and the object conforms to the CopyableObject protocol, a copy of the object is assigned to self before assigning the new value. If the object does nto conform to CopyableObject and the reference is not unique, this is a fatal error. More on CopyableObject later.

The setter of a pure property or subscript in a class is assumed to be cloning (taking self as inout) unless explicitly marked as noncloning.

Deinitializers

deinit is called whenever an object reference count reaches zero. While this can happen in the middle of a pure function, tasks performed by deinit typically can't be pure. In fact, making deinit pure would only guarenty that it doesn't accomplish anything thanks to value semantics (the only value it can affect is self, which is about to disappear anyway).

FIXME: So we have to tolerate that deinit could do something to compromise the purity of the function, like writing to a global variable. But even if we tolerate this possible breach of value semantics, this could pose a problem for thread-safety (deinit being called from the wrong thread).

`CopyableObject`

Classes with pure cloning members may to conform to the CopyableObject protocol. This protocol defines an initializer to be called when there is more than one reference to the object and we need a uniquely referenced copy to write to.

protocol CopyableObject: class {
	pure required init(copying object: Self)
}

A default implementation copying all the pure stored properties could be synthesized by the compiler. It can't copy non-pure stored properties however because being a pure initializer it does not have access to them. In the presence of non-pure stored properties, no implementation is synthesized.

Note that there is no requirement that this be a deep copy. References to other classes can be shared between instances because the rules above guaranty that objects referenced by stored properties in the class will either have copy-on-write behavior or not be accessible from pure functions.

Note that a non-synthesized constructor could break purity by not copying correctly the pure properties.

Note about the Ownership Manifesto

The Ownership Manifesto describes a Copyable protocol which gives an object to the ability to copy its value from one variable to another. This is a different concept than the CopyableObject protocol described here, which is a mean for the compiler to provide a copy-on-write behavior automatically.

Overrides

Overriding a function or property cannot weaken the pure guaranty made in the base class, but it can strengthen it. For instance, you can override a non-pure function with a pure function. Note however that a pure function is not allowed to call its superclass non-pure implementation.

Closures

A closure is implicitly pure if it only call pure functions, getters, setters, or subscripts. You can make this explicit by prefixing the parameter list with pure:

let closure = { pure (a: Int) -> Int in
	return a + 1
}

Conditional pure and pure constraints

In some cases a function will be pure in itself, but will have to call other functions passed through parameters. The purity of the function then depends on the purity of the passed arguments. For instance, here is a function taking a closure provided by the caller:

pure? func callMeBack(callback: pure? () -> Void) {
	callback()
}

When the callback parameter is pure, the callMeBack function is pure too. The question mark after pure means that the function is conditionally pure depending on the argument passed to it. This syntax proposed here works similarily to throws and rethrows.

If the function takes a generic type, it can contrain its pure attribute to match one or more members of that generic type. In the following example, the wether altHashvalue is pure function depends on whether T.hashValue is pure:

pure? func altHashValue<T: Hashable>(_ a: T) -> Bool where T.hashValue: pure? {
	return a.hashValue &+ 1
}

The where T.hashValue: pure? constrain this the altHashValue function to be pure if of T.hashValue is a pure property. The compiler must check that whole body of code in the function follows the rules for a pure function, but makes an exception for T.hashValue. At the call site, the funciton is deemed pure only if T.hashValue is pure.

If instead of being conditionally pure we wanted altHashValue to always be pure, then the constraint can express that as well. We can express that by removing the question mark from the pure attribute in front of altHashValue, and removing the question mark in the constraint:

pure func altHashValue<T: Hashable>(_ a: T) -> Bool where T.hashValue: pure {
	return a.hashValue &+ 1
}

Here the constraint tells us that T.hashValue must absolutely be pure for this function to be called.

Unsafe pure

For cases where a function is known to be pure but the compiler can't prove that it is, the standard library provides a wrapper function for calling non-pure functions. This should be particularly useful when wrapping code from other languages:

let result = unsafePure { someExternalFunctionInC() }

Unsafe pure works with be conditional pure too:

pure? func callMeBack(callback: pure? @convention(c) () -> Void) {
	unsafePure {
		someExternalFunctionInC(callback)
	}
}

Thread-safety

Pure functions are automatically thread-safe since they do not have access to any shared mutable state.

It is safe to pass an object to a pure function running in another thread while continuing to use it the current thread. This is because only the pure (non-cloning) members of the object are accessible to the pure function. The current thread is free to mutate the non-pure portions of the object.

Weak references

Weak references mutate automatically when the object at the other end vanishes. Stored properties containing weak references are thus not allowed to be pure, and are inaccessible to pure functions.

Since a pure function might stop referencing an object, the object might get deallocated causing weak references to that object to become nil. A pure function is normally not allowed to have side effects, but non-pure functions can observe the effect of the deallocation. Thread-safety is still preserved.

There is another noteworthy side effect of weak references regarding value semantics. Since isKnownUniqueReference does not take into account weak references, a weak reference to an object could be used to observe the value of pure stored properties as they change, either from another thread or between successive calls to a pure function in the same thread. This could be fixed by having isKnownUniqueReference return false in the presence of weak references.

Library evolution

Changing a non-pure function to a pure one in a future version of a library is allowed. A pure function that becomes non-pure is a breaking change however, so this should be avoided.

There is an exception however for open members in classes: changing them from non-pure to pure is not allowed as this would break overrides defined in other modules.

ABI considerations

To be determined.

Migration Strategy

While this feature is purely additive, it is expected that library authors will get significant pressure to make APIs pure everywhere possible so that their user base can make use of pure themselves. This might be a significant burden, especially for libraries with downstream dependencies not yet annotated for pure or libraries with code in other languages.

Given that the compiler is able to check if pure is valid for every function, we should offer a tool capable of proposing adding pure to declarations whenever they are eligible, in other words when all their dependencies are pure. A flag causing the compiler to emit those suggestions would work perfectly. Every time a library dependency is updated for pure, the tool could be re-run to see if more things can become pure.

Because pure is a commitment when it comes to public APIs, library authors should review more carefully the annotations suggested by the tool for anything that is public or open.

APIs imported from other languages can't be automatically checked for pure eligibility by the compiler. They would need to be annotated using a mechanism similar to nullability. The user on the Swift side can use unsafePure to wrap any external call if they are confident the function respects pure semantics.

Future Directions

Breaking away from value semantics

We could redefine pure simply as being thread-safe. This would have no impact on the mechanics in this proposal, but it would make it legal to use unsafePure to wrap any thread-safe code that does not follow the same-input-for-same-output requirement of a pure function. With this change pure would denote provably thread-safe code and unsafePure would be code that is assumed to be safe regardless of what the compiler says.

Regardless of what this document says about the semantics of pure, there is a possibility that this is how pure and unsafePure would end up being used in practice. The insentives to use pure this way are easy to see: auditing a few uses of unsafePure is much easier than auditing a bigger body of code. The more code you have that is guarantied to be thread-safe with pure the less you have to worry about thread-safety.

A nice side effect of not requiring value-semantics would be to allow pure deinitializers to do something useful. Deinitalizers could be marked pure and accomplish things like closing file handles in a guarentied thread-safe manner. Another effect of this is that properties with weak references would be allowed to be pure because the way they mutate is thread-safe.

Relaxing the definition of pure in this way would augment the annotation burden for existing code since there is a larger set of functions that are thread-safe.

If we change this, we'll need a more representative name than pure.

Pure actors

The Task-based concurrency manifesto discusses the concept of actors. Actors attempt to isolate some code to be run independently on a queue. Isolation is limited however, since actors operates in the same memory space than the rest of the program and have access to everything.

We could improve this by allowing a pure actor, an actor where every member is pure. Such an actor would benefit from compiler checked data isolation and would be suitable for computation tasks that do not require access to the outside world.

If however we break away from value semantics (as described in the previous section) it'd become possible to use a shared database, access the file system, and communicate through the network in a pure actor instead of being limited to pure computation tasks.

Optimization

In theory, the rules for pure make sure that for the same input you'll always get the same output. So the compiler could allow elision of repeated calls with the same argument by reusing the result from a preceding call.

This property cannot be proven whenever unsafePure is used. Normally, unsafePure functions is supposed to be pure even if this is not verifiable by the compiler, but there are insentives to break it in some cases (described in the preceding two sections above). It might be more prudent to not allow this optimization to take place in general so that things works in a more predictble way when unsafePure is used in a way that breaks the semantic guaranties pure is supposed to offer.

Alternatives considered

`nonpure` for open members

As a special case, open functions and properties in classes cannot become pure in a future version of a library; this would break compatibility with existing overrides outside of the module. We could make it a tristate where the default allows maximum compatibility for future versions of the library:

open: calling is not pure, but overrides must be pure (maximum future compatibility, less usefulness)
open pure : calling is pure, overrides must be pure
open nonpure: calling is not pure, overrides may or may not be pure

That would however be a major source-breaking change for libraries, as anything currently open would have to be relabeled open nonpure for the current overrides to continue to work. Automatic migration is a possibility. Another would be to keep the current meaning of open and use another word for the maximum compatibility option:

open strict: calling is not pure, but overrides must be pure (maximum future compatibility, less usefulness)
open pure : calling is pure, overrides must be pure
open: calling is not pure, overrides may or may not be pure

The utility of having strict when it is not the default is up to debate however.

Implicit pure

Pure could be inferred within the module for members that do not have an explicit pure attribute set using the following rules:

Any let variable is implicitly pure.
A function is implicitly pure if it does not call any non-pure function.
A computed property is implicitly pure if its implementation does not call any non-pure function.
A stored property is implicitly pure if it is immutable (let) or is a mutable (var) member of a struct, enum, or a local variable.
Specifically not included in (4) are global mutable stored properties and mutable stored properties in class instances.

Implicit pure is not visible to other modules. Public functions must be explicitly marked pure as a commitment that the implementation will stay pure in future revision of the module.

Implicit pure has some drawbacks. While it might improve progressive disclosure by allowing functions to be used where they need to be pure without explicit annotations, this means that a small change in the implementation of a function can make it non-pure and cause a cascade of functions depending on the first one to become non-pure, causing hard to decipher errors far from where the change took place.

So this proposal is only proposing implicit pure applies to closures. The same rules as above could be used by a migration tool to annotate pure functions however.

Deinitializer alternatives

Another possibility is to disallow classes with deinitializers from being used inside of pure functions. This would end up limiting us to final classes however, since you can't prove that all derived classes of a base class will have no deinitializer.

boraseoksoon/Pure for guarantied value semantics.md

Pure for guarantied value semantics

Introduction

Motivation

Proposed Solution

Globals, locals, struct, and enum

Functions & initializers

Default Arguments

Properties & subscripts

Class

Copy on write

Functions

Properties & subscripts

Deinitializers

CopyableObject

Note about the Ownership Manifesto

Overrides

Closures

Conditional pure and pure constraints

Unsafe pure

Thread-safety

Weak references

Library evolution

ABI considerations

Migration Strategy

Future Directions

Breaking away from value semantics

Pure actors

Optimization

Alternatives considered

nonpure for open members

Implicit pure

Deinitializer alternatives

`CopyableObject`

`nonpure` for open members