This repository contains source code of a shallow water partial differential equation (PDE) solver. The solver is written in differentiable Swift which allows end-to-end propagation of derivatives through time. In physics this kind of solution is traditionally called adjoint and it unlocks a new class of algorithms for optimal control of PDEs.
More details about the PDE and how to derive it from general Navier-Stokes equations can found for example on Wikipedia.
The following notebook demonstrates how the differentiability of the solver can be leveraged to solve an otherwise tricky optimal control problem.
The entire project is inspired by DiffTaichi paper, in particular the wave.py
example. Besides exploring and having fun, the main goal is to benchmark differentiable Swift against the DiffTaichi framework. For the time being, the focus is only on single core CPU performance.
DiffTaichi uses an imperative programming language and transforms the Python source code in a way that preserves arithmetic intensity and parallelism. For my personal interests, DiffTaichi provides an excellent benchmark. It uses a very lightweight tape to record operations for the backward pass and focuses on physical simulations. The lightweight tape gives good performance in cases, where the studied algorithm operates on small matrices. This typically happens in physical system simulations. Tight loops and element-wise modification of matrices generally pose a significant performance bottleneck for traditional ML frameworks like PyTorch or TensorFlow 2. These frameworks were designed to run well in situations when the runtime is dominated by linear algebra operations on large matrices.
Swift for TensorFlow aims to solve these kinds of problems by integrating the differentiability directly into the compiler. This will eventually bring a wide palette of benefits, which are in depth described in this manifesto. However, as of version 0.10 the differentiability features are still new and to a large extent experimental. So before implementing a larger project it's worthwhile to spend some time and try to poke hole in the whole thing. First, this helps to better understand the strengths and weaknesses of the framework. And second, one can gain more experience how to implement different algorithms and ideas.
-
Reimplement DiffTaichi's
wave.py
shallow water PDE solver example using only Foundation (i.e.[[Float]]
or[[Double]]
). This will serve two purposes. First, get myself familiar with the differentiable API. And second, serve as a baseline to make sure the other results are not completely off. Optionally, if the results don't make sense, in forward-only mode the Foundation-only implementation should have a similar performance as a C implementation. -
Reimplement DiffTaichi's
wave.py
with TensorFlow (i.e.Tensor<Float>
) data types. I'm curious to see how it fares against the Foundation only version. Chances are it's going to be very slow do to eager mode. Hopefully compiling the time-stepping with XLA will make things better. -
Benchmark the Swift reimplementations against DiffTaichi on a simple problem that doesn't require differentiability. This will give a baseline for the expensiveness of the backward pass relative to the forward one.
-
Benchmark solving an optimization problem that also exercises the backward pass. Ideally, the parameters like the the grid size and number of time-steps should stay the same so the runtimes of the backward and forward pass can be compared.
Solution of the shallow water hyperbolic PDE is calculated with finite-difference discretization on unit square. Time-stepping uses semi implicit Euler's schema. Time step itself is limited to stay below the Courant–Friedrichs–Lewy numerical stability limit. Values around the edges of the domain are subject to trivial Dirichlet boundary conditions - i.e. they're equal to zero with an arbitrary normal derivative.
The bread and butter of the solver is the Laplace operator. Its evaluation is the bulk of the execution time. This particular implementation approximates the laplacian with five-point stencil finite-differencing.
The repository contains 3 different implementations of the same algorithm. The key difference is the underlying data storage of the discretized Laplace operator.
- A -
ArrayLoopSolution.swift
: Uses[[Float]]
for storage. Calculations are done in a tight 2D loop. - B -
TensorLoopSolution.swift
: UsesTensor
for storage. Calculations are done in a tight 2D loop. - C -
TensorSliceSolution.swift
: Uses arithmetic operations on slices ofTensor
and no loops. - D -
TensorConvSolution.swift
: Uses 2D convolution with a fixed kernel for calculations,Tensor
and no loops.
A and B are written in an imperative style that is very close to DiffTaichi. These are therefore the two most representative candidates for benchmarking. C was written more or less out of desperation from bugs encountered while working on A and B. It uses arithmetics on tensor slices in place of the 2D loops.
The discretized Laplace operator can be viewed as a convolution or also a special type of a Gaussian blur operation. D uses a fixed 3x3 single channel weight matrix and runs convolution on a 2D greyscale image. Shades of gray in this case represent the height of the water surface.
In some sense, solutions A and B reimplement convolution from scratch using loops. From the performance perspective, this is a very bad idea. The problem here is that not all discretizations of the laplacian can be written as a convolution. So this benchmark has some relevance for these alternative discretizations that operate on unstructured meshes. Typical examples of this is FVM (finite volume method) or FEM (finite element method).
The following table contains results of the benchmark on macOS S4TF development toolchain 2020-09-16-a
. There's no conclusion yet.
name time std iterations warmup
-------------------------------------------------------------------------------------------
Shallow Water PDE Solver.C Loop 355.480 ms ± 4.35 % 10 nan ms
Shallow Water PDE Solver.Array Loop 665.574 ms ± 1.52 % 10 1298.180 ms
Shallow Water PDE Solver.Tensor Slice 536.672 ms ± 4.60 % 10 1201.721 ms
Shallow Water PDE Solver.Tensor Slice (XLA) 640.667 ms ± 1.91 % 10 1470.422 ms
Shallow Water PDE Solver.Tensor Conv 7502.167 ms ± 0.21 % 10 14969.662 ms
Shallow Water PDE Solver.Tensor Conv (XLA) 562.674 ms ± 0.25 % 10 1333.303 ms
More details and the benchmark source code can be found in the following notebook.
Benchmarks.ipynb
The plan is to revisit the project again when the differentiability of coroutines is working. In particular, at the moment it's not possible to differentiate through indexed assignments like anArray[i] = someVal
. Progress on these issues is tracked under the following links: