I've been reading this much-publicized paper on neural ordinary differential equations:
https://arxiv.org/abs/1806.07366
I found their presentation of the costate/adjoint method to be lacking in intuition and notational clarity, so I decided to write up my own tutorial treatment. I'm familiar with this material from its original setting in optimal control theory.
You have a dynamical system described by an autonomous first-order ODE, x' = f(x), where the state x belongs to an n-dimensional vector space. There is a value function V(x) defined over the state space. Given a particular path t -> x(t) satisfying the ODE, we may evaluate it at the terminal time T to get the terminal state x(T) and the terminal value V(x(T)).