Last active
September 27, 2024 00:05
-
-
Save yig/c1959ce997f1d2fd6f3f982cb482e060 to your computer and use it in GitHub Desktop.
Matrix derivatives via Frobenius norm
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Matrix derivatives via Frobenius norm | |
# Automatic matrix derivatives: http://www.matrixcalculus.org/ | |
# A good primer on basic matrix calculus: https://atmos.washington.edu/~dennis/MatrixCalculus.pdf | |
# The Matrix Reference Manual: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html#Intro | |
# Trying to understand the derivative of the inverse: https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix | |
# Derivative of the pseudoinverse: | |
https://math.stackexchange.com/questions/2179160/derivative-of-pseudoinverse-with-respect-to-original-matrix | |
https://mathoverflow.net/questions/25778/analytical-formula-for-numerical-derivative-of-the-matrix-pseudo-inverse | |
https://mathoverflow.net/questions/264130/derivative-of-pseudoinverse-with-respect-to-original-matrix/264426 | |
https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix | |
# Math Overflow user john316 does derivatives with Frobenius norms ( https://math.stackexchange.com/users/262158/john316 ): | |
https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix | |
https://math.stackexchange.com/questions/1405922/what-is-the-gradient-of-f-s-abat-2/1406290#1406290 | |
https://math.stackexchange.com/questions/946911/minimize-the-frobenius-norm-of-the-difference-of-two-matrices-with-respect-to-ma/1474048#1474048 | |
# Math Overflow user greg also does derivatives with Frobenius norms ( https://math.stackexchange.com/users/357854/greg ): | |
https://math.stackexchange.com/questions/2444284/matrix-derivative-of-frobenius-norm-with-hadamard-product-inside | |
https://math.stackexchange.com/questions/1890313/derivative-wrt-to-kronecker-product/1890653#1890653 | |
https://math.stackexchange.com/questions/2125499/second-derivative-of-det-sqrtftf-with-respect-to-f/2125849#2125849 | |
# Some matrix calculus: | |
Practical Guide to Matrix Calculus for Deep Learning (Andrew Delong) | |
http://www.psi.toronto.edu/~andrew/papers/matrix_calculus_for_learning.pdf | |
# Properties (: is Frobenius inner product, ⊙ is element-wise Hadamard product, ⋅ is matrix multiplication, ᵀ is transpose): | |
A:B=B:A | |
A:(B+C)=A:B + A:C | |
A:B=Aᵀ:Bᵀ | |
A⊙B=B⊙A | |
A:B⊙C=A⊙B:C | |
tr(Aᵀ⋅B) = tr(A⋅Bᵀ) = tr(Bᵀ⋅A) = tr(B⋅Aᵀ) = A:B | |
A:(B⋅C) = (Bᵀ⋅A):C = (A⋅Cᵀ):B | |
d(X:Y) = (dX):Y + X:(dY) | |
d(X:X) = dX:X + X:dX = 2X:dX | |
d(X⊙Y) = (dX)⊙Y + X⊙(dY) | |
d(X⋅Y) = (dX)⋅Y + X⋅(dY) | |
d(Xᵀ) = (dX)ᵀ | |
dZ/dX = dZ/dY ⋅ dY/dX | |
d(inv(X)) = -inv(X)⋅dX⋅inv(X) | |
vec_column( A⋅B⋅C ) = ( Cᵀ kronecker A ) ⋅ vec_column( B ) | |
vec_row( A⋅B⋅C ) = ( A kronecker Cᵀ ) ⋅ vec_row( B ) | |
# Example | |
E = norm2( A⋅x - b ) = M : M | |
dE = 2M : dM | |
dM = dA⋅x + A⋅dx - db | |
[To compute dE/dx, set the other derivatives to 0 and isolate dx] 2M : A⋅dx = 2 Aᵀ M : dx <=> dE/dx = 2 Aᵀ ( A x - b ) | |
[You can compute dE/dA, which we don't usually do, just as easily. Set the other derivatives to 0 and isolate dA] 2M : dA⋅x = 2 M xᵀ : dA <=> dE/dA = 2 ( A x - b ) xᵀ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Updated example and added matrixcalculus.org link