Apple's Swift 4 recently introduced some amazing new features to support archiving/unarchiving as part of Foundation. This post explores some techniques for unarchiving/decoding while ensuring that instances of reference types can be shared within the object graph based on an arbitrary identifier.
Stepping back a little, imagine you had a simple construct representing a Car and its owner/driver. Using value types, this might look like:
struct Person : Codable {
let identifier: Int
}
struct Car : Codable {
let owner: Person
let driver: Person
}
let kate = Person(identifier: 200)
let car = Car(owner: kate, driver: kate)
When a car
gets encoded, it looks something like:
{
"owner" : { "identifier" : 200 },
"driver" : { "identifier" : 200 }
}
which is great. When that JSON gets decoded, fresh new copies of the Person
and Car
structs are constructed. Lovely.
However, imagine that your Person
object was actually a reference type, instead of a value type. When decoding in this case, there will actually be two instances of Person
created and assigned to owner
and driver
respectively. In this case, the decoded object graph doesn't match the original object graph. ie. a single Person
instance shared between the owner
and driver
properties.
This article explores some ways we can achieve the desired behaviour based on a modified Person
type that now looks like:
class Person : Codable {
let identifier: Int
init(identifier: Int) {
self.identifier = identifier
}
}
A really simple approach is to use the Decoder.nestedContainer()
function to interrogate the child elements to see if their identifiers are the same. The basic logic here is:
- Decode the
owner
object - Find the
driver
's identifier - Re-use the
owner
object if the identifiers are the same - Decode into a new object if their are different.
The code for this might look like:
struct Car : Codable {
...
enum PersonKeys: String, CodingKey {
case identifier
}
public init(from decoder: Decoder) throws {
let myContainer = try decoder.container(keyedBy: Car.CodingKeys.self)
owner = try myContainer.decode(Person.self, forKey: .owner)
let driverContainer = try myContainer.nestedContainer(keyedBy: Car.PersonKeys.self, forKey: .driver)
let driverId = try driverContainer.decode(Int.self, forKey: .identifier)
if driverId == owner.identifier {
driver = owner
}
else {
driver = try myContainer.decode(Person.self, forKey: .driver)
}
}
}
For models as simple as the above, this technique is workable. However, one drawback is that it only scales for object graphs where all of the reference types are within the same Codable
element. For example, imagine a JSON structure that had many different Person
objects scattered throughout the tree. The above code relies on the caching occurring at a local level only.
Expanding on the previous example, the next logical steps would be to create a cache that contains a list of previously found Person
objects. Then, as the data is getting decoded, we can pull out the identifier field of the Person
sub-structure and use that to read from the cache.
The code to do this might look something like:
var personCache = [Int:Person]()
struct Car : Codable {
...
public init(from decoder: Decoder) throws {
let myContainer = try decoder.container(keyedBy: Car.CodingKeys.self)
let ownerContainer = try myContainer.nestedContainer(keyedBy: Car.PersonKeys.self, forKey: .owner)
let ownerId = try ownerContainer.decode(Int.self, forKey: .identifier)
owner = try personCache[ownerId] ?? myContainer.decode(Person.self, forKey: .owner)
personCache[ownerId] = owner
let driverContainer = try myContainer.nestedContainer(keyedBy: Car.PersonKeys.self, forKey: .driver)
let driverId = try driverContainer.decode(Int.self, forKey: .identifier)
driver = try personCache[driverId] ?? myContainer.decode(Person.self, forKey: .driver)
personCache[driverId] = driver
}
}
This is better, in that it can handle Person
objects that may be present anywhere in the inbound JSON. However, there are still a couple of down-sides to this approach. Specifically:
- This technique means
Person
objects are shared across multiple decoding sessions which may, or may not, be what you want - The cache is vulnerable to all sorts of threading issues
- It is pretty verbose
- Using a global is generally icky
Swift 4's JSONDecoder
provides a useful userInfo
property that we can use to localise our cache to the single decoding session. That might look something like:
// a unique key to be able to find the cache
let personCacheKey = CodingUserInfoKey(rawValue: "DecodableCache")!
// a simple typealias for our cache
typealias PersonCache = [Int:Person]
struct Car : Codable {
...
public init(from decoder: Decoder) throws {
let myContainer = try decoder.container(keyedBy: Car.CodingKeys.self)
// get a reference to our session cache
var personCache = decoder.userInfo[personCacheKey] as! PersonCache
let ownerContainer = try myContainer.nestedContainer(keyedBy: Car.PersonKeys.self, forKey: .owner)
let ownerId = try ownerContainer.decode(Int.self, forKey: .identifier)
owner = try personCache[ownerId] ?? myContainer.decode(Person.self, forKey: .owner)
personCache[ownerId] = owner
let driverContainer = try myContainer.nestedContainer(keyedBy: Car.PersonKeys.self, forKey: .driver)
let driverId = try driverContainer.decode(Int.self, forKey: .identifier)
driver = try personCache[driverId] ?? myContainer.decode(Person.self, forKey: .driver)
personCache[driverId] = driver
}
}
let decoder = JSONDecoder()
decoder.userInfo = [ personCacheKey : PersonCache() ]
try decoder.decode(Car.self, from: data)
This is a little better, as our Person
cache is constrained to a single JSON decoding session, yet we still have the opportunity of sharing the same PersonCache
across decoding sessions (if required) by passing it in when setting up the userInfo
.
Having said that, though, I still don't like that:
- The force cast when extracting the cache from the
userInfo
is horrible - It relies on the caller setting up the
userInfo
object in theJSONDecoder
which is very brittle.
Note: In order to mitigate the point, I initially attempted to lazily initialise the cache when fetching it from the userInfo
. However, that doesn't work because at the point where we consume userInfo
, it is referenced from the base Decoder
type - where it is declared as read-only. We are able to set it up initially, though, because JSONDecoder
redeclares it as writable.
Broadly speaking, though, I think we're on the right track. Let's do a little bit of refactoring to start to tidy things up. The first thing we should do is encapsulate the cache into something a little nicer.
class DecodableCache where Key : Hashable {
private var values: [Key:Value] = [:]
subscript(index: Key) -> Value? {
get {
return values[index]
}
set(newValue) {
values[index] = newValue
}
}
}
Now, let's extend the Decoder
object to provide a type-safe function that returns an instance of our DecodableCache
object. As mentioned earlier, I tried to use the userInfo
object on Decoder
, however, for some reason Apple has marked Decodable.userInfo
as read-only, whereas JSONDecodable.userInfo
is read-write. Because we want the cache to be available to all decoders (not just JSON), we need to look at alternatives.
One such alternative is associated objects. We can extend Decodable
in the following way:
extension Decoder {
private var personCache: DecodableCache {
var personCacheKey = "personCache"
if let cache = objc_getAssociatedObject(self, &personCacheKey) as? DecodableCache {
return cache
}
else {
let cache = DecodableCache()
objc_setAssociatedObject(self, &personCacheKey, cache, .OBJC_ASSOCIATION_RETAIN)
return cache
}
}
}
This offers a lazily instantiated type-safe cache variable on the Decoder object.
Now that we've improved our cache infrastructure, we'll also take this opportunity to refactor out the common Person
handling into another extension method on Decodable
using the following code:
func nestedPerson(container: KeyedDecodingContainer, forKey key: Car.CodingKeys) throws -> Person {
let personContainer = try container.nestedContainer(keyedBy: Car.PersonKeys.self, forKey: key)
let personId = try personContainer.decode(Int.self, forKey: .identifier)
if let cachedPerson = personCache[personId] {
return cachedPerson
}
let decodedPerson = try container.decode(Person.self, forKey: key)
personCache[personId] = decodedPerson
return decodedPerson
}
The above fetches a nested "person" container and extracts the identifier
field to use as the key into the cache. If the object is found, then it will be returned. If it is not found, the object will be decoded using the standard decoding process, added to the cache, and then returned. This simplifies the decoding process for Car
greatly, which now looks like:
struct Car : Codable {
...
public init(from decoder: Decoder) throws {
let myContainer = try decoder.container(keyedBy: Car.CodingKeys.self)
self.owner = try decoder.nestedPerson(container: myContainer, forKey: .owner)
self.driver = try decoder.nestedPerson(container: myContainer, forKey: .driver)
}
}
One thing I did notice when writing the nestedPerson
function is that when referencing Car.CodingKeys
in a generic function signature, the compiler forces me to explicitly declare the enum. Oddly, though, I can reference the (generated) enum with no problems inside a function body. I have raised a Swift 4 defect, but for now, to work around it you just have to explicitly declare the CodingKeys
enum:
enum CodingKeys: String, CodingKey {
case owner
case driver
}
As you can see from the above, it was a bit of a journey to arrive at the eventual solution. I think that there is still further opportunity to further improve the solution (I'd love to be able to use userInfo
instead of associated objects), but I think the general technique is sound.
If you have any comments, thoughts or suggestions, feel free to comment here or hit me up on twitter
Full source code that can be run in an Xcode 9 playground is available in this gist.