We show that CFM (Contrastive flow matching) objective is fundamentally indifferent from FM because it's simply affine-transformation of the target velocity, which could be learned post-hoc.
Assumptions:
$\alpha_t = 1 - t, \sigma_t = t \Rightarrow v = -x_i + \epsilon_i, \tilde v = -\tilde x + \tilde\epsilon$ $(\tilde x,\tilde\epsilon)$ is drawn i.i.d. from the dataset, independent of$(x_i,\epsilon_i)$ - Sampled noise has zero mean:
$\mathbb{E}[\tilde\epsilon] = 0$ - Let
$\mu_x := \mathbb{E}[\tilde x]$ (empirical average$\bar x$ in practice)