dmwit · July 8, 2019 21:25 · conal · Jul 8, 2019
diff --git a/Dd.txt b/Dd.txt
 In the Essence of Automatic Differentiation, Conal defines

 D+ :: (a -> b) -> a -> (b, a -o b)
 D+(f,a) = (f(a), D(f,a))

 where a and b are vector spaces, -o is the space of linear functions, and D is
 the derivative operation.

 But to me this suffers the same problem as considering numbers to be the
 derivatives of 1D spaces functions, vectors/matrices for higher dimensions,
 etc: it is mistaking the representation for their denotation. I think the
 denotation should be affine functions that encode both the b and the a -o b. I
 concretely propose dmwit's enriched version of D, call it Dd to disambiguate:

 Dd :: (a -> b) -> a -> (a -a b)
 Dd(f,a) = \a' -> D(f,a)(a'-a) + f(a)

 where -a is the space of affine functions. Why is this the right selection of
 additions and subtractions? Because we can verify that now composition is really
 composition in the semantic domain:

 Dd(g.f, a) = Dd(g,f(a)) . Dd(f,a)

 The rule for derivatives is also quite beautiful when written in terms of Dd:

 lim      ||f(x') - Dd(f,x)(x')||
 x'->x     -------------------------    = 0
                 ||x'-x||

 It directly reads as saying that Dd(f,x) is approximately the same function as f.
	In the Essence of Automatic Differentiation, Conal defines

	D+ :: (a -> b) -> a -> (b, a -o b)
	D+(f,a) = (f(a), D(f,a))

	where a and b are vector spaces, -o is the space of linear functions, and D is
	the derivative operation.

	But to me this suffers the same problem as considering numbers to be the
	derivatives of 1D spaces functions, vectors/matrices for higher dimensions,
	etc: it is mistaking the representation for their denotation. I think the
	denotation should be affine functions that encode both the b and the a -o b. I
	concretely propose dmwit's enriched version of D, call it Dd to disambiguate:

	Dd :: (a -> b) -> a -> (a -a b)
	Dd(f,a) = \a' -> D(f,a)(a'-a) + f(a)

	where -a is the space of affine functions. Why is this the right selection of
	additions and subtractions? Because we can verify that now composition is really
	composition in the semantic domain:

	Dd(g.f, a) = Dd(g,f(a)) . Dd(f,a)

	The rule for derivatives is also quite beautiful when written in terms of Dd:

	lim \|\|f(x') - Dd(f,x)(x')\|\|
	x'->x ------------------------- = 0
	\|\|x'-x\|\|

	It directly reads as saying that Dd(f,x) is approximately the same function as f.