Skip to content

Instantly share code, notes, and snippets.

@vojtamolda
Last active January 2, 2023 16:57
Show Gist options
  • Save vojtamolda/bd85033cf62877e6f8ada68b8bbb32a0 to your computer and use it in GitHub Desktop.
Save vojtamolda/bd85033cf62877e6f8ada68b8bbb32a0 to your computer and use it in GitHub Desktop.
Benchmark.ipynb

Differentiable Shallow Water PDE Solver

This repository contains source code of a shallow water partial differential equation (PDE) solver. The solver is written in differentiable Swift which allows end-to-end propagation of derivatives through time. In physics this kind of solution is traditionally called adjoint and it unlocks a new class of algorithms for optimal control of PDEs.

More details about the PDE and how to derive it from general Navier-Stokes equations can found for example on Wikipedia.

The following notebook demonstrates how the differentiability of the solver can be leveraged to solve an otherwise tricky optimal control problem.

Showcase.ipynb ColabBadge

Motivation

The entire project is inspired by DiffTaichi paper, in particular the wave.py example. Besides exploring and having fun, the main goal is to benchmark differentiable Swift against the DiffTaichi framework. For the time being, the focus is only on single core CPU performance.

DiffTaichi uses an imperative programming language and transforms the Python source code in a way that preserves arithmetic intensity and parallelism. For my personal interests, DiffTaichi provides an excellent benchmark. It uses a very lightweight tape to record operations for the backward pass and focuses on physical simulations. The lightweight tape gives good performance in cases, where the studied algorithm operates on small matrices. This typically happens in physical system simulations. Tight loops and element-wise modification of matrices generally pose a significant performance bottleneck for traditional ML frameworks like PyTorch or TensorFlow 2. These frameworks were designed to run well in situations when the runtime is dominated by linear algebra operations on large matrices.

Swift for TensorFlow aims to solve these kinds of problems by integrating the differentiability directly into the compiler. This will eventually bring a wide palette of benefits, which are in depth described in this manifesto. However, as of version 0.10 the differentiability features are still new and to a large extent experimental. So before implementing a larger project it's worthwhile to spend some time and try to poke hole in the whole thing. First, this helps to better understand the strengths and weaknesses of the framework. And second, one can gain more experience how to implement different algorithms and ideas.

Goals

  1. Reimplement DiffTaichi's wave.py shallow water PDE solver example using only Foundation (i.e. [[Float]] or [[Double]]). This will serve two purposes. First, get myself familiar with the differentiable API. And second, serve as a baseline to make sure the other results are not completely off. Optionally, if the results don't make sense, in forward-only mode the Foundation-only implementation should have a similar performance as a C implementation.

  2. Reimplement DiffTaichi's wave.py with TensorFlow (i.e. Tensor<Float>) data types. I'm curious to see how it fares against the Foundation only version. Chances are it's going to be very slow do to eager mode. Hopefully compiling the time-stepping with XLA will make things better.

  3. Benchmark the Swift reimplementations against DiffTaichi on a simple problem that doesn't require differentiability. This will give a baseline for the expensiveness of the backward pass relative to the forward one.

  4. Benchmark solving an optimization problem that also exercises the backward pass. Ideally, the parameters like the the grid size and number of time-steps should stay the same so the runtimes of the backward and forward pass can be compared.

Implementation

Solution of the shallow water hyperbolic PDE is calculated with finite-difference discretization on unit square. Time-stepping uses semi implicit Euler's schema. Time step itself is limited to stay below the Courant–Friedrichs–Lewy numerical stability limit. Values around the edges of the domain are subject to trivial Dirichlet boundary conditions - i.e. they're equal to zero with an arbitrary normal derivative.

The bread and butter of the solver is the Laplace operator. Its evaluation is the bulk of the execution time. This particular implementation approximates the laplacian with five-point stencil finite-differencing.

The repository contains 3 different implementations of the same algorithm. The key difference is the underlying data storage of the discretized Laplace operator.

A and B are written in an imperative style that is very close to DiffTaichi. These are therefore the two most representative candidates for benchmarking. C was written more or less out of desperation from bugs encountered while working on A and B. It uses arithmetics on tensor slices in place of the 2D loops.

The discretized Laplace operator can be viewed as a convolution or also a special type of a Gaussian blur operation. D uses a fixed 3x3 single channel weight matrix and runs convolution on a 2D greyscale image. Shades of gray in this case represent the height of the water surface.

In some sense, solutions A and B reimplement convolution from scratch using loops. From the performance perspective, this is a very bad idea. The problem here is that not all discretizations of the laplacian can be written as a convolution. So this benchmark has some relevance for these alternative discretizations that operate on unstructured meshes. Typical examples of this is FVM (finite volume method) or FEM (finite element method).

Conclusion

The following table contains results of the benchmark on macOS S4TF development toolchain 2020-09-16-a. There's no conclusion yet.

name                                        time         std        iterations warmup      
-------------------------------------------------------------------------------------------
Shallow Water PDE Solver.C Loop              355.480 ms ±   4.35 %         10       nan ms
Shallow Water PDE Solver.Array Loop          665.574 ms ±   1.52 %         10  1298.180 ms
Shallow Water PDE Solver.Tensor Slice        536.672 ms ±   4.60 %         10  1201.721 ms
Shallow Water PDE Solver.Tensor Slice (XLA)  640.667 ms ±   1.91 %         10  1470.422 ms
Shallow Water PDE Solver.Tensor Conv        7502.167 ms ±   0.21 %         10 14969.662 ms
Shallow Water PDE Solver.Tensor Conv (XLA)   562.674 ms ±   0.25 %         10  1333.303 ms

More details and the benchmark source code can be found in the following notebook. Benchmarks.ipynb ColabBadge

The plan is to revisit the project again when the differentiability of coroutines is working. In particular, at the moment it's not possible to differentiate through indexed assignments like anArray[i] = someVal. Progress on these issues is tracked under the following links:

import Foundation
import Swim
// MARK: Solution of shallow water equation
/// Differentiable solution of shallow water equation on a unit square.
///
/// Shallow water equation is a type of hyperbolic partial differential equation (PDE). This struct
/// represents its solution calculated with finite-difference discretization on a 2D plane and at a
/// particular point in time.
///
/// More details about the shallow water PDE can found for example on
/// [Wikipedia](https://en.wikipedia.org/wiki/Shallow_water_equations)
///
/// # Domain and Discretization
/// The PDE is solved on a `<0,1>x<0,1>` square discretized with spatial step of size `Δx`.
/// Laplace operator is approximated with five-point stencil finite-differencing.
///
/// Temporal advancing uses semi implicit Euler's schema. Time step `Δt` is calculated from
/// `Δx` to stay below the Courant–Friedrichs–Lewy numerical stability limit.
///
/// # Boundary Conditions
/// Values around the edges of the domain are subject to trivial Dirichlet boundary conditions
/// (i.e. equal to 0 with an arbitrary gradient).
///
/// # Laplace Operator Δ
/// Discretization of the operator is implemented as tight loops over the water height field.
/// This is a very naive but natural implementation that serves as a performance baseline
/// on the CPU.
///
struct ArrayLoopSolution: ShallowWaterEquationSolution {
/// Water level height
var waterLevel: [[Float]] { u1 }
/// Solution time
var time: Float { t }
/// Height of the water surface at time `t`
private var u1: [[Float]]
/// Height of the water surface at previous time-step `t - Δt`
private var u0: [[Float]]
/// Solution time
@noDerivative private let t: Float
/// Speed of sound
@noDerivative private let c: Float = 340.0
/// Dispersion coefficient
@noDerivative private let α: Float = 0.001
/// Number of spatial grid points
@noDerivative private let resolution: Int = 256
/// Spatial discretization step
@noDerivative private var Δx: Float { 1 / Float(resolution) }
/// Time-step calculated to stay below the CFL stability limit
@noDerivative private var Δt: Float { (sqrt(α * α + Δx * Δx / 3) - α) / c }
/// Creates initial solution with water level `u0` at time `t`.
@differentiable
init(waterLevel u0: [[Float]], time t: Float = 0.0) {
self.u0 = u0
self.u1 = u0
self.t = t
precondition(u0.count == resolution)
precondition(u0.allSatisfy { $0.count == resolution })
}
/// Calculates solution stepped forward by one time-step `Δt`.
///
/// - `u0` - Water surface height at previous time step
/// - `u1` - Water surface height at current time step
/// - `u2` - Water surface height at next time step (calculated)
@differentiable
func evolved() -> ArrayLoopSolution {
var u2 = u1
for x in 1 ..< resolution - 1 {
for y in 1 ..< resolution - 1 {
// FIXME: Should be u2[x][y] = ...
u2.update(x, y, to:
2 * u1[x][y] +
(c * c * Δt * Δt + c * α * Δt) * Δ(u1, x, y) -
u0[x][y] - c * α * Δt * Δ(u0, x, y)
)
}
}
return ArrayLoopSolution(u0: u1, u1: u2, t: t + Δt)
}
/// Constructs intermediate solution with previous water level `u0`, current water level `u1` and time `t`.
@differentiable
private init(u0: [[Float]], u1: [[Float]], t: Float) {
self.u0 = u0
self.u1 = u1
self.t = t
precondition(u0.count == self.resolution)
precondition(u0.allSatisfy { $0.count == self.resolution })
precondition(u1.count == self.resolution)
precondition(u1.allSatisfy { $0.count == self.resolution })
}
/// Applies discretized Laplace operator to scalar field `u` at grid points `x` and `y`.
@differentiable
private func Δ(_ u: [[Float]], _ x: Int, _ y: Int) -> Float {
( u[x][y + 1]
+ u[x - 1][y] - (4 * u[x][y]) + u[x + 1][y] +
u[x][y - 1] ) / Δx / Δx
}
}
// MARK: - Cost calculated as mean L2 distance to a target image
extension ArrayLoopSolution {
/// Calculates mean squared error loss between the solution and a `target` grayscale image.
@differentiable
func meanSquaredError(to target: Swim.Image<Gray, Float>) -> Float {
precondition(target.width == resolution && target.height == resolution)
var mse: Float = 0.0
for x in 0 ..< resolution {
for y in 0 ..< resolution {
let error = target[x, y][.gray] - u1[x][y]
mse += error * error * Δx * Δx
}
}
return mse
}
}
// MARK: - Workaround for non-differentiable coroutines
// https://bugs.swift.org/browse/TF-1078
// https://bugs.swift.org/browse/TF-1080
fileprivate extension Array where Element == [Float] {
@differentiable(wrt: (self, value))
mutating func update(_ x: Int, _ y: Int, to value: Float) {
let _ = withoutDerivative(at: (value)) { value -> Int? in
self[x][y] = value
return nil
}
}
@derivative(of: update, wrt: (self, value))
mutating func vjpUpdate(_ x: Int, _ y: Int, to value: Float) ->
(value: (), pullback: (inout Array<[Float]>.TangentVector) -> Float) {
self.update(x, y, to: value)
func pullback(`self`: inout Array<[Float]>.TangentVector) -> Float {
let `value` = `self`[x][y]
`self`[x][y] = Float(0)
return `value`
}
return ((), pullback)
}
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#include <math.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
const int n = 256;
const int duration = 512;
const float c = 340.0f;
const float alpha = 0.00001f;
int idx(const int x, const int y) {
return y * n + x;
}
float laplace(const float* const u, const int x, const int y) {
const float dx = 1.0f / n;
return (-4 * u[idx(x,y)] \
+ u[idx(x+1,y)] + u[idx(x-1,y)] \
+ u[idx(x,y+1)] + u[idx(x,y-1)]) / dx / dx;
}
void timestep(float* u, const float* const u1, const float* const u0) {
const float dx = 1.0f / n;
const float dt = (sqrt(alpha * alpha + dx * dx / 3.0f) - alpha) / c;
for (int x = 1; x < n-1; x++) {
for (int y = 1; y < n-1; y++) {
u[idx(x,y)] = 2 * u1[idx(x,y)] \
+ (c * c * dt * dt + c * alpha * dt) * laplace(u1, x, y) \
- u0[idx(x,y)] - c * alpha * dt * laplace(u0, x, y);
}
}
}
void splash() {
float* u = (float*) calloc(n*n, sizeof(float));
float* u1 = (float*) calloc(n*n, sizeof(float));
u1[idx(n/2, n/2)] = 100.0f;
float* u0 = (float*) calloc(n*n, sizeof(float));
u0[idx(n/2, n/2)] = 100.0f;
for (int t = 0; t < duration; t++) {
timestep(u, u1, u0);
float* tmp = u0;
u0 = u1;
u1 = u;
u = tmp;
}
free(u);
free(u1);
free(u0);
}
int main(int argc, const char * argv[]) {
printf("Warmup...\n");
for (int i = 0; i < 2; i++) {
splash();
}
printf("Benchmarks...\n");
for (int i = 0; i < 10; i++) {
clock_t start = clock();
splash();
long double milisecs = (long double)(clock() - start) * 1000.0 / CLOCKS_PER_SEC;
printf("%Lf ms\n", milisecs);
}
return 0;
}
Display the source blob
Display the rendered blob
Raw
Sorry, this is too big to display.
View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment