==== Integration by parts ====
$\int_a^b (p\cdot f)' dx = (p\cdot f)_a^b = (p\cdot f)(b) - (p\cdot f)(a)$
$\implies$
$\int_a^b p\cdot f', dx = -\int_a^b p'\cdot f, dx + (p\cdot f)(b) - (p\cdot f)(a)$
=== Score ===
$s_{\theta^}:=\frac{p'}{p}=\frac{d}{dx} \log p(x)$
Note that $p' = s_{\theta^}\cdot p$
For convenience here, write $g$ for $g(x)$ when the notation is clear from context.
The notation comes from an ML context and assuming this score can be learned,
fixing parameters $\theta$, with $\theta^$ being an optimum. Having $s_{\theta^}$,
we can sample from it to effectively swim towards $p$'s extrema (and reconstruct its neighborhood relative to that extremum.
=== Probability measure variant ===
$\int_a^b p\cdot f', dx = -\int_a^b p'\cdot f, dx + (p\cdot f)(b) - (p\cdot f)(a)$
Let $\mu$ measure $[a,b]$ and write $d\mu=p,dx$.
$E_\mu[\frac{d}{dx}f] = E_\mu[-s_{\theta^*} \cdot f] + (p\cdot f)(b) - (p\cdot f)(a)$
or again together,
$E_\mu[\left(s_{\theta^*} + \frac{d}{dx}\right) f(t)] = (p\cdot f)(b) - (p\cdot f)(a)$
If either $f$ or $p$ is zero at the bounds, the right hand is zero.
Then, in expectation, multiplication by $-s_{\theta^}$ acts as gradient application.
Knowing $s_{\theta^}$ encoded having already take the derivative (but of $p$),
and gradient computation (of any $f$) can then be done by local point evaluation.
===Special case===
$f$ is a translation of the distribution, i.e. $f(x)=p(x-d)$
$E_\mu[p'(x-d)] = \int \left(\left(-\frac{p'(x)}{p(x)}\right) p(x-d)\right) p(x), dx = \int \left(-p'(x)\right) p(x-d) dx = E_\mu[-p'(x+d)]$
(See 4 drawings)
====Optimization====
To find $\theta$, we want to minimize
$E_\mu[\vert\vert s_{\theta} - s_{\theta^} \vert\vert^2] = E_\mu[s_{\theta}^2 - 2 s_{\theta^} s_{\theta} + s_{\theta^*}^2] = E_\mu[\left(s_{\theta} + 2\frac{d}{dx}\right) s_{\theta}] + c$
where $c$ is a positive constant, propto the mean score gradient of the true distribution itself.
Those calculations generalize from $\frac{d}{dx}$ to $\nabla_x$.
Note we have replaced required knowledge of functional form of $s_{\theta^*}$ with
the need for computation of the descent $-\frac{d}{dx}$.