Skip to content

Instantly share code, notes, and snippets.

@robertmryan
Created September 18, 2024 20:20
Show Gist options
  • Save robertmryan/eb0771e5c96e1d36356f1cf0a49fe256 to your computer and use it in GitHub Desktop.
Save robertmryan/eb0771e5c96e1d36356f1cf0a49fe256 to your computer and use it in GitHub Desktop.
@propertyWrapper
struct Synchronized<T>: @unchecked Sendable {
private var _wrappedValue: T
private let lock = NSLock()
var wrappedValue: T {
get { lock.withLock { _wrappedValue } }
set { lock.withLock { _wrappedValue = newValue } }
}
init(wrappedValue: T) {
_wrappedValue = wrappedValue
}
}
final class ComplexData: @unchecked Sendable {
@Synchronized var firstName: String
@Synchronized var lastName: String
init(firstName: String, lastName: String) {
self.firstName = firstName
self.lastName = lastName
}
}
actor Foo {
func process1(lotsOfData: [ComplexData]) async {
await withTaskGroup(of: Void.self) { group in
for data in lotsOfData {
group.addTask {
// Do complex things with complex data, and then give data to another process
await self.process2(data: data)
// I will never ever do anything more with data
}
}
}
}
nonisolated func process2(data: ComplexData) async {
// Do complex things with complex data, and then give data to another process
await process3(data: data)
// I will never ever do anything more with data
}
nonisolated func process3(data: ComplexData) async {
// Do complex things with complex data, and I am done
}
}
@robertmryan
Copy link
Author

robertmryan commented Sep 18, 2024

The burden of unstructured concurrency is that you do not enjoy automatic cancelation propagation, but rather you have to handle cancelation yourself. Consider this rendition of process2 that uses unstructured concurrency. We would use withTaskCancellationHandler to handle cancelation properly:

nonisolated func process2(data: ComplexData) async {
    //  Do complex things with complex data, and then give data to another process
    let task = Task.detached {
        await self.process3(data: data)
    }
    //  I will never ever do anything more with data

    await withTaskCancellationHandler {
        await task.value
    } onCancel: {
        task.cancel()
    }
}

Now, as I commented above, I wouldn’t use unstructured concurrency at all: But if you were, you probably want to handle cancellation like here.

@robertmryan
Copy link
Author

robertmryan commented Sep 18, 2024

This also begs the question, even if you did want to use unstructured concurrency, of why process2 would even use Task.detached {…} rather than Task {…}. In Swift concurrency, the asynchronous context is defined by the isolation of the function you are calling. It is not like DispatchQueue.global().async {…} (where the burden was often on the caller to get the work on the appropriate queue). In Swift concurrency, the definition of process3 defines on what thread/queue/context that function runs. The use of Task.detached does not have the same utility as our old GCD global queues.

@robertmryan
Copy link
Author

Another observation: Can you give us examples of what process1, process2 and process3 are doing? Is it slow/synchronous work, or are you awaiting other asynchronous functions?

I ask, because we have a contract with Swift concurrency to never block a thread in the cooperative thread pool. If it is a lengthy compute process with a loop, we might periodically await Task.yield() (and try Task.checkCancellation()) in that loop. If you are calling some API that is slow and synchronous you might even need to process it outside of Swift concurrency and then bridge it back with a continuation.

Bottom line, this example is so abstract that it is hard to provide concrete counsel. But the take home messages of this gist is that you should avoid unnecessary unstructured concurrency, and where absolutely needed, make sure to add your own cancelation handling code.

@robertmryan
Copy link
Author

By the way, I might also contemplate a further simplification (with my above caveats notwithstanding):

actor Foo {
    func process1(lotsOfData: [ComplexData]) async {
        await withTaskGroup(of: Void.self) { group in
            for data in lotsOfData {
                group.addTask {
                    await self.process2(data: data)
                    await self.process3(data: data)
                }
            }
        }
    }

    nonisolated func process2(data: ComplexData) async {
        //  Do complex things with complex data, and then return so caller can give data to another process
    }

    nonisolated func process3(data: ComplexData) async {
        //  Do complex things with complex data, and I am done
    }
}

@Mini-Stef
Copy link

The burden of unstructured concurrency is that you do not enjoy automatic cancelation propagation, but rather you have to handle cancelation yourself. Consider this rendition of process2 that uses unstructured concurrency. We would use withTaskCancellationHandler to handle cancelation properly:

Now, as I commented above, I wouldn’t use unstructured concurrency at all: But if you were, you probably want to handle cancellation like here.

For of all, many thanks for taking the time to write all this. Very useful and helps me understand better Swift concurrency. As you might have guessed I am coming from "old" real-time systems where we directly manage tasks/threads, semaphore and interrupts, so that's quite a change.

Aside of cancellation, any other disadvantage of unstructured concurrency ?
For the time being, cancellation is not a requirement - maybe just because it was not present up to now.

@Mini-Stef
Copy link

Another observation: Can you give us examples of what process1, process2 and process3 are doing?

I can't disclose the content, but we are indeed doing heavy computations. At the very early days, the system was done with a big linear algorithm running in one single thread (and in another language). With the arrival of multi-core processors, scientists splitted the algorithm in 3 almost equivalent chained parts, so running with 3 tasks on 3 cores allowed to pipeline that, giving huge speed improvement as 3 data could processed simultaneously (almost, the gain in not x3, rather in x2, but still...).
In swift 5 I had done that as I described, and I am just looking for a way to make it Swift 6 friendly.

@Mini-Stef
Copy link

So thanks again for the help. I'll take time to think about all that and find the most convenient way forward. After this reading, I think I still need some time to really understand the concepts of actors and isolation. I am still too much thinking in terms of tasks/threads and exchange of data.
I'll post the final version on SO.

Maybe, maybe, Swift could be improved with some kind of mechanism to say "I am passing this data to another thread/isolation context and I won't touch it again", and actually check it's the case... (good old "message passing" of RT systems :-))

@robertmryan
Copy link
Author

robertmryan commented Sep 19, 2024

@Mini-Stef

I can't disclose the content, but we are indeed doing heavy computations.

I don’t need to know precisely what you are doing, but I could provide better counsel if I understood what type of work you are doing. E.g., if your code consists of looping while doing a calculation, then an occasional await Task.yield() and try Task.checkCancellation() inside that loop (or every nth iteration) might be sufficient.

But then again, if that impacts performance too much and/or you are calling some blocking API over which you have no control, then you might move it out of Swift concurrency and bridge it back with a continuation. I might advise checking out Visualize and optimize Swift concurrency, which says:

If you have code that needs to do [work that can block the thread], move that code outside of the concurrency thread pool – for example, by running it on a DispatchQueue – and bridge it to the concurrency world using continuations.

That avoids blocking the Swift concurrency cooperative thread pool, which is limited to the number of processors on the device. As they discuss in that video, if you block the threads from the cooperative thread pool, you can deadlock and/or cause other problems. Moving these blocking calls back to GCD, it avoids that potential problem.

@Mini-Stef
Copy link

Thank you again so much for your time and advices.

The algorithm we are using is not doing any blocking calls. It is indeed looping through a vast number of elements (millions) (these millions constitutes one of the complexData, and there is also meta-data inside) and applies some maths to each of the elements and to groups of them. There are 3 passes with three different types of formulaes applied to both the original complexData plus the outcome of the previous passes.
We can have a list of thousands of complexData to process per day.

But all in all, I am not really after cancellation. I admit it may be an interesting added feature, but for that's not my goal right now. I had to struggle to convince to move from the old C version to Swift, and we did get loads of benefit from that (including removing dormant bugs). I am now just trying to move it to Swift 6 to see if we could get further benefit, but not directly looking for improvement of performance/features - at least right now. I was surprised that I didn't get so many warnings errors while switching to Swift 6.
But OK, I do get the point that structured concurrency will allow me to introduce some cancellation in the future.

I have been reading a lot about isolation context, and I now think I understand better what they are. Admittedly, 'complexData-N' does not need to move from one isolation context to another. We just need to make sure that complexData_N and complexData_N+1 are processed in parrallel (said differently, as soon as a core becomes free, we need it to start process a complexData. I guess that's exactly what

    func process1(lotsOfData: [ComplexData]) async {
        await withTaskGroup(of: Void.self) { group in
            for data in lotsOfData {
                group.addTask {
                    await self.process2(data: data)
                    await self.process3(data: data)
                }
            }
        }
    }

does.

I am still wondering how a non Sendable data could be moved from one isolation context to another if we promise the sending context will forever forget about that data. Maybe still old-way thinking... but I guess there might be use cases.

Once again, thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment