Last active
February 18, 2024 16:48
-
-
Save Nikolaj-K/88e49a2c8623d0a688de3cbdfe769885 to your computer and use it in GitHub Desktop.
Neural Networks as Quantum Field Theories and basic Neural Tangent Kernel Theory
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Below is the sketch of my script for the video at: | |
https://youtu.be/ZSmORp3Bm2c | |
Paper discussed: | |
- https://arxiv.org/pdf/2307.03223.pdf | |
Large width network theory: | |
- https://en.wikipedia.org/wiki/Universal_approximation_theorem | |
- https://en.wikipedia.org/wiki/Gaussian_process | |
- https://en.wikipedia.org/wiki/Neural_network_Gaussian_process | |
- Neural Tangen Kernel Theory: | |
+ https://en.wikipedia.org/wiki/Neural_tangent_kernel | |
+ https://github.com/google/neural-tangents | |
+ https://lilianweng.github.io/posts/2022-09-08-ntk/ | |
+ https://github.com/google/neural-tangents/blob/main/presentation/neurips_linearization_poster.pdf | |
Information geometry: | |
- https://en.wikipedia.org/wiki/Information_geometry | |
QFT: | |
- https://en.wikipedia.org/wiki/Lagrangian_(field_theory)#Scalar_field_theory | |
- https://en.wikipedia.org/wiki/Schwinger_function | |
- https://en.wikipedia.org/wiki/Propagator | |
- https://en.wikipedia.org/wiki/Correlation_function_(quantum_field_theory), $G=\langle v\mid \phi foo\mid v\rangle$ | |
- https://en.wikipedia.org/wiki/Partition_function_(quantum_field_theory), $Z[J] = \sum \prod G's$, view as generatinc function and connect to path integral | |
- https://en.wikipedia.org/wiki/Quantum_field_theory | |
- https://en.wikipedia.org/wiki/LSZ_reduction_formula | |
===== Neural network Gaussian processes and neural tangent kernel ===== | |
Shortly discussed in this video: | |
$\bullet$ Neural Network Gaussian Processes (NNGP's) | |
$\bullet$ Neural Tangent Kernel theory | |
$\bullet$ Quantum field theory motivation/reminders | |
$\bullet$ ANN-for-QFT and QFT-for-ANN | |
==== NNGP's ==== | |
$\bullet$ paper https://arxiv.org/pdf/2307.03223.pdf | |
$\bullet$ For simplicity, consider a fully connected network with one hidden layer. | |
- E.g. https://www.gabormelli.com/RKB/File:2NNw.png | |
$\bullet$ Large width limit | |
- https://en.wikipedia.org/wiki/Universal_approximation_theorem | |
- and more as you see in this video | |
$\bullet$ https://en.wikipedia.org/wiki/Gaussian_process | |
$\bullet$ https://en.wikipedia.org/wiki/Neural_network_Gaussian_process | |
- '95 Neal | |
$\bullet$ (1.2) => (1.3) with e.g. (1.7) (note: zero mean) giving (1.11) and yet more specifically (1.13) | |
$\bullet$ Non-Gaussian non-zero cumulant effects emerge from either not taking the infinite width limit, | |
or alternatively also from violating statistical assumptions in the central limit theorem. | |
The latter can practically be done by making the random sampling of certain parameters (weight-components) | |
depending on already sampled parameters. | |
(But note that with the finite-width approach, the universal approximation property will generally also break.) | |
$\bullet$ https://en.wikipedia.org/wiki/Lagrangian_(field_theory)#Scalar_field_theory | |
- see scalar field theory | |
$\bullet$ Wick rotation, $i^2=-1$, | |
- https://en.wikipedia.org/wiki/Schwinger_function | |
- Osterwalder–Schrader axioms | |
$\bullet$ Random initialization => field theory sampling | |
$\bullet$ In the other direction: Use e.g. Feynman diagrams to compute information about typical random initialization | |
==== Sidenote on learning ==== | |
$\bullet$ Learning => fulfill task | |
$\bullet$ Gradient descent | |
- https://egallic.fr/Enseignement/ML/ECB/book/figs/example_two_dim/descent_2D_sphere.gif | |
$\bullet$ As is common in stochastic control, one may look at the step sizes in a very fine limit, i.e. pass to calculus proper. | |
$\theta'(t) = -\nabla_\theta\Phi$ | |
with | |
$\Phi:=\sum_z C(f_{\theta(t)}(z), y_z)$ | |
$z$ ... learning data | |
$\bullet$ Compute time development of ouotput $f$. This ends up looking a lot like Hamiltonian mechanics EOM | |
$\bullet$ https://en.wikipedia.org/wiki/Neural_tangent_kernel#Details | |
- Analytical solutions possible in the NNGP limit ($\Theta$ constant etc.) | |
+ People currently study to what extent this is valid also for finite width | |
- https://github.com/google/neural-tangents | |
- https://lilianweng.github.io/posts/2022-09-08-ntk/ | |
- https://github.com/google/neural-tangents/blob/main/presentation/neurips_linearization_poster.pdf | |
==== QFT motivation (micro crash course) ==== | |
$\bullet$ QM reminder: Root of chance for finite motion given in Hilbert space | |
+ $\langle out\mid e^{itH} \mid in \rangle=:K$, i.e. the pairing of $\mid out\rangle$ and $e^{itH} \mid in\rangle$, i.e. the propagated $\mid in\rangle$. | |
+ $e^{itH}$ solution of Schrödinger infinitesimally generates evolution, $i\partial_t \Psi = H\Psi$ | |
$\bullet$ $G$ ... 2-point function/propagator/Green's funcitons solves the differential equation with a delta peak. | |
- Often, roughly, $G=\Theta\cdot K$ ($\Theta$ ... here just a Heavyside function) | |
+ https://en.wikipedia.org/wiki/Propagator#Basic_examples:_propagator_of_free_particle_and_harmonic_oscillator | |
$\bullet$ More statical view for QFT: | |
- https://en.wikipedia.org/wiki/Correlation_function_(quantum_field_theory), $G=\langle v\mid \phi foo\mid v\rangle$ | |
- https://en.wikipedia.org/wiki/Partition_function_(quantum_field_theory), $Z[J] = \sum \prod G's$, view as generatinc function and connect to path integral | |
$\bullet$ LSZ ultra-roughly: | |
- Interested in $K$ ... perator-valued distribution formalism ... $K$ is function of $\langle p_{out}\mid \phi foo\mid p_{in}\rangle$. | |
- Energy comes in through field operator expression | |
+ $E = \omega \hbar = mc^2\sqrt{1+{\vec p}/(mc^2)}\approx mc^2+\tfrac{1}{2}m{\vec v}^2$ | |
+ $\phi(t, x) \sim \int d{\vec p}\, \omega^{-1/2}\, e^{i(-\omega t + {\vec p}{\vec x})} a_{\vec p}^*$ + cc" | |
- https://en.wikipedia.org/wiki/Quantum_field_theory#Path_integrals | |
+ LSZ: $\langle p_{out}\mid \phi foo\mid p_{in}\rangle = \prod_{i=1}^{n_{out}+n_{in}} \Delta(p_i, m_i) {\mathcal F} G$ | |
- https://en.wikipedia.org/wiki/LSZ_reduction_formula | |
$\bullet$ Back to paper | |
- E.g. tinkering with parameter-level non-Gaussianity in (4.13) | |
- 4.2. φ^4 Theory as a Neural Network Field Theory, break product form in | |
+ See (4.21)-(4.25) | |
+ Then insertion (4.29) | |
==== Sidenote on related topics ==== | |
$\bullet$ There's also attempts of studying manifold metric flows via a NN parametrization, including parametrizations for physics. | |
$\bullet$ Of course, not to be confused, there's also 'conventional information geometry' **neuromanifold** | |
- https://en.wikipedia.org/wiki/Information_geometry |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment