Nikolaj-K · February 18, 2024 16:48
diff --git a/nnqft.tex b/nnqft.tex
 Below is the sketch of my script for the video at:

 https://youtu.be/ZSmORp3Bm2c


 Paper discussed: 
    - https://arxiv.org/pdf/2307.03223.pdf
 Large width network theory:
    - https://en.wikipedia.org/wiki/Universal_approximation_theorem
    - https://en.wikipedia.org/wiki/Gaussian_process
    - https://en.wikipedia.org/wiki/Neural_network_Gaussian_process
    - Neural Tangen Kernel Theory:
        + https://en.wikipedia.org/wiki/Neural_tangent_kernel
        + https://github.com/google/neural-tangents
        + https://lilianweng.github.io/posts/2022-09-08-ntk/
        + https://github.com/google/neural-tangents/blob/main/presentation/neurips_linearization_poster.pdf
 Information geometry:
    - https://en.wikipedia.org/wiki/Information_geometry
 QFT:
    - https://en.wikipedia.org/wiki/Lagrangian_(field_theory)#Scalar_field_theory
    - https://en.wikipedia.org/wiki/Schwinger_function
    - https://en.wikipedia.org/wiki/Propagator
    - https://en.wikipedia.org/wiki/Correlation_function_(quantum_field_theory), $G=\langle v\mid \phi foo\mid v\rangle$
    - https://en.wikipedia.org/wiki/Partition_function_(quantum_field_theory), $Z[J] = \sum \prod G's$, view as generatinc function and connect to path integral
    - https://en.wikipedia.org/wiki/Quantum_field_theory
    - https://en.wikipedia.org/wiki/LSZ_reduction_formula


 ===== Neural network Gaussian processes and neural tangent kernel =====
 Shortly discussed in this video:
 $\bullet$ Neural Network Gaussian Processes (NNGP's)
 $\bullet$ Neural Tangent Kernel theory
 $\bullet$ Quantum field theory motivation/reminders
 $\bullet$ ANN-for-QFT and QFT-for-ANN

 ==== NNGP's ====
 $\bullet$ paper https://arxiv.org/pdf/2307.03223.pdf

 $\bullet$ For simplicity, consider a fully connected network with one hidden layer.
    - E.g. https://www.gabormelli.com/RKB/File:2NNw.png

 $\bullet$ Large width limit
    - https://en.wikipedia.org/wiki/Universal_approximation_theorem
    - and more as you see in this video

 $\bullet$ https://en.wikipedia.org/wiki/Gaussian_process
 $\bullet$ https://en.wikipedia.org/wiki/Neural_network_Gaussian_process
    - '95 Neal
 $\bullet$ (1.2) => (1.3) with e.g. (1.7) (note: zero mean) giving (1.11) and yet more specifically  (1.13)

 $\bullet$ Non-Gaussian non-zero cumulant effects emerge from either not taking the infinite width limit,
 or alternatively also from violating statistical assumptions in the central limit theorem.
 The latter can practically be done by making the random sampling of certain parameters (weight-components)
 depending on already sampled parameters.
 (But note that with the finite-width approach, the universal approximation property will generally also break.)
 $\bullet$ https://en.wikipedia.org/wiki/Lagrangian_(field_theory)#Scalar_field_theory
    - see scalar field theory
 $\bullet$ Wick rotation, $i^2=-1$,
    - https://en.wikipedia.org/wiki/Schwinger_function
    - Osterwalder–Schrader axioms

 $\bullet$ Random initialization => field theory sampling
 $\bullet$ In the other direction: Use e.g. Feynman diagrams to compute information about typical random initialization

 ==== Sidenote on learning ====
 $\bullet$ Learning => fulfill task
 $\bullet$ Gradient descent
    - https://egallic.fr/Enseignement/ML/ECB/book/figs/example_two_dim/descent_2D_sphere.gif
 $\bullet$ As is common in stochastic control, one may look at the step sizes in a very fine limit, i.e. pass to calculus proper.
 $\theta'(t) = -\nabla_\theta\Phi$
 with
 $\Phi:=\sum_z C(f_{\theta(t)}(z), y_z)$
 $z$ ... learning data
 $\bullet$ Compute time development of ouotput $f$. This ends up looking a lot like Hamiltonian mechanics EOM
 $\bullet$ https://en.wikipedia.org/wiki/Neural_tangent_kernel#Details
    - Analytical solutions possible in the NNGP limit ($\Theta$ constant etc.)
        + People currently study to what extent this is valid also for finite width
    - https://github.com/google/neural-tangents
    - https://lilianweng.github.io/posts/2022-09-08-ntk/
    - https://github.com/google/neural-tangents/blob/main/presentation/neurips_linearization_poster.pdf

 ==== QFT motivation (micro crash course) ====
 $\bullet$ QM reminder: Root of chance for finite motion given in Hilbert space
    + $\langle out\mid e^{itH} \mid in \rangle=:K$, i.e. the pairing of $\mid out\rangle$ and $e^{itH} \mid in\rangle$, i.e. the propagated $\mid in\rangle$.
    + $e^{itH}$ solution of Schrödinger infinitesimally generates evolution, $i\partial_t \Psi = H\Psi$
 $\bullet$ $G$ ... 2-point function/propagator/Green's funcitons solves the differential equation with a delta peak.
    - Often, roughly, $G=\Theta\cdot K$ ($\Theta$ ... here just a Heavyside function)
        + https://en.wikipedia.org/wiki/Propagator#Basic_examples:_propagator_of_free_particle_and_harmonic_oscillator
 $\bullet$ More statical view for QFT:
    - https://en.wikipedia.org/wiki/Correlation_function_(quantum_field_theory), $G=\langle v\mid \phi foo\mid v\rangle$
    - https://en.wikipedia.org/wiki/Partition_function_(quantum_field_theory), $Z[J] = \sum \prod G's$, view as generatinc function and connect to path integral
 $\bullet$ LSZ ultra-roughly:
    - Interested in $K$ ... perator-valued distribution formalism ... $K$ is function of $\langle p_{out}\mid \phi foo\mid p_{in}\rangle$.
    - Energy comes in through field operator expression
        + $E = \omega \hbar = mc^2\sqrt{1+{\vec p}/(mc^2)}\approx mc^2+\tfrac{1}{2}m{\vec v}^2$
        + $\phi(t, x) \sim \int d{\vec p}\, \omega^{-1/2}\, e^{i(-\omega t + {\vec p}{\vec x})} a_{\vec p}^*$ + cc"
            - https://en.wikipedia.org/wiki/Quantum_field_theory#Path_integrals
        + LSZ: $\langle p_{out}\mid \phi foo\mid p_{in}\rangle = \prod_{i=1}^{n_{out}+n_{in}} \Delta(p_i, m_i) {\mathcal F} G$
            - https://en.wikipedia.org/wiki/LSZ_reduction_formula

 $\bullet$ Back to paper
    - E.g. tinkering with parameter-level non-Gaussianity in (4.13)
    - 4.2. φ^4 Theory as a Neural Network Field Theory, break product form in
        + See (4.21)-(4.25)
        + Then insertion (4.29)

 ==== Sidenote on related topics ====
 $\bullet$ There's also attempts of studying manifold metric flows via a NN parametrization, including parametrizations for physics.

 $\bullet$ Of course, not to be confused, there's also 'conventional information geometry' **neuromanifold**
    - https://en.wikipedia.org/wiki/Information_geometry
	Below is the sketch of my script for the video at:

	https://youtu.be/ZSmORp3Bm2c


	Paper discussed:
	- https://arxiv.org/pdf/2307.03223.pdf
	Large width network theory:
	- https://en.wikipedia.org/wiki/Universal_approximation_theorem
	- https://en.wikipedia.org/wiki/Gaussian_process
	- https://en.wikipedia.org/wiki/Neural_network_Gaussian_process
	- Neural Tangen Kernel Theory:
	+ https://en.wikipedia.org/wiki/Neural_tangent_kernel
	+ https://github.com/google/neural-tangents
	+ https://lilianweng.github.io/posts/2022-09-08-ntk/
	+ https://github.com/google/neural-tangents/blob/main/presentation/neurips_linearization_poster.pdf
	Information geometry:
	- https://en.wikipedia.org/wiki/Information_geometry
	QFT:
	- https://en.wikipedia.org/wiki/Lagrangian_(field_theory)#Scalar_field_theory
	- https://en.wikipedia.org/wiki/Schwinger_function
	- https://en.wikipedia.org/wiki/Propagator
	- https://en.wikipedia.org/wiki/Correlation_function_(quantum_field_theory), $G=\langle v\mid \phi foo\mid v\rangle$
	- https://en.wikipedia.org/wiki/Partition_function_(quantum_field_theory), $Z[J] = \sum \prod G's$, view as generatinc function and connect to path integral
	- https://en.wikipedia.org/wiki/Quantum_field_theory
	- https://en.wikipedia.org/wiki/LSZ_reduction_formula


	===== Neural network Gaussian processes and neural tangent kernel =====
	Shortly discussed in this video:
	$\bullet$ Neural Network Gaussian Processes (NNGP's)
	$\bullet$ Neural Tangent Kernel theory
	$\bullet$ Quantum field theory motivation/reminders
	$\bullet$ ANN-for-QFT and QFT-for-ANN

	==== NNGP's ====
	$\bullet$ paper https://arxiv.org/pdf/2307.03223.pdf

	$\bullet$ For simplicity, consider a fully connected network with one hidden layer.
	- E.g. https://www.gabormelli.com/RKB/File:2NNw.png

	$\bullet$ Large width limit
	- https://en.wikipedia.org/wiki/Universal_approximation_theorem
	- and more as you see in this video

	$\bullet$ https://en.wikipedia.org/wiki/Gaussian_process
	$\bullet$ https://en.wikipedia.org/wiki/Neural_network_Gaussian_process
	- '95 Neal
	$\bullet$ (1.2) => (1.3) with e.g. (1.7) (note: zero mean) giving (1.11) and yet more specifically (1.13)

	$\bullet$ Non-Gaussian non-zero cumulant effects emerge from either not taking the infinite width limit,
	or alternatively also from violating statistical assumptions in the central limit theorem.
	The latter can practically be done by making the random sampling of certain parameters (weight-components)
	depending on already sampled parameters.
	(But note that with the finite-width approach, the universal approximation property will generally also break.)
	$\bullet$ https://en.wikipedia.org/wiki/Lagrangian_(field_theory)#Scalar_field_theory
	- see scalar field theory
	$\bullet$ Wick rotation, $i^2=-1$,
	- https://en.wikipedia.org/wiki/Schwinger_function
	- Osterwalder–Schrader axioms

	$\bullet$ Random initialization => field theory sampling
	$\bullet$ In the other direction: Use e.g. Feynman diagrams to compute information about typical random initialization

	==== Sidenote on learning ====
	$\bullet$ Learning => fulfill task
	$\bullet$ Gradient descent
	- https://egallic.fr/Enseignement/ML/ECB/book/figs/example_two_dim/descent_2D_sphere.gif
	$\bullet$ As is common in stochastic control, one may look at the step sizes in a very fine limit, i.e. pass to calculus proper.
	$\theta'(t) = -\nabla_\theta\Phi$
	with
	$\Phi:=\sum_z C(f_{\theta(t)}(z), y_z)$
	$z$ ... learning data
	$\bullet$ Compute time development of ouotput $f$. This ends up looking a lot like Hamiltonian mechanics EOM
	$\bullet$ https://en.wikipedia.org/wiki/Neural_tangent_kernel#Details
	- Analytical solutions possible in the NNGP limit ($\Theta$ constant etc.)
	+ People currently study to what extent this is valid also for finite width
	- https://github.com/google/neural-tangents
	- https://lilianweng.github.io/posts/2022-09-08-ntk/
	- https://github.com/google/neural-tangents/blob/main/presentation/neurips_linearization_poster.pdf

	==== QFT motivation (micro crash course) ====
	$\bullet$ QM reminder: Root of chance for finite motion given in Hilbert space
	+ $\langle out\mid e^{itH} \mid in \rangle=:K$, i.e. the pairing of $\mid out\rangle$ and $e^{itH} \mid in\rangle$, i.e. the propagated $\mid in\rangle$.
	+ $e^{itH}$ solution of Schrödinger infinitesimally generates evolution, $i\partial_t \Psi = H\Psi$
	$\bullet$ $G$ ... 2-point function/propagator/Green's funcitons solves the differential equation with a delta peak.
	- Often, roughly, $G=\Theta\cdot K$ ($\Theta$ ... here just a Heavyside function)
	+ https://en.wikipedia.org/wiki/Propagator#Basic_examples:_propagator_of_free_particle_and_harmonic_oscillator
	$\bullet$ More statical view for QFT:
	- https://en.wikipedia.org/wiki/Correlation_function_(quantum_field_theory), $G=\langle v\mid \phi foo\mid v\rangle$
	- https://en.wikipedia.org/wiki/Partition_function_(quantum_field_theory), $Z[J] = \sum \prod G's$, view as generatinc function and connect to path integral
	$\bullet$ LSZ ultra-roughly:
	- Interested in $K$ ... perator-valued distribution formalism ... $K$ is function of $\langle p_{out}\mid \phi foo\mid p_{in}\rangle$.
	- Energy comes in through field operator expression
	+ $E = \omega \hbar = mc^2\sqrt{1+{\vec p}/(mc^2)}\approx mc^2+\tfrac{1}{2}m{\vec v}^2$
	+ $\phi(t, x) \sim \int d{\vec p}\, \omega^{-1/2}\, e^{i(-\omega t + {\vec p}{\vec x})} a_{\vec p}^*$ + cc"
	- https://en.wikipedia.org/wiki/Quantum_field_theory#Path_integrals
	+ LSZ: $\langle p_{out}\mid \phi foo\mid p_{in}\rangle = \prod_{i=1}^{n_{out}+n_{in}} \Delta(p_i, m_i) {\mathcal F} G$
	- https://en.wikipedia.org/wiki/LSZ_reduction_formula

	$\bullet$ Back to paper
	- E.g. tinkering with parameter-level non-Gaussianity in (4.13)
	- 4.2. φ^4 Theory as a Neural Network Field Theory, break product form in
	+ See (4.21)-(4.25)
	+ Then insertion (4.29)

	==== Sidenote on related topics ====
	$\bullet$ There's also attempts of studying manifold metric flows via a NN parametrization, including parametrizations for physics.

	$\bullet$ Of course, not to be confused, there's also 'conventional information geometry' neuromanifold
	- https://en.wikipedia.org/wiki/Information_geometry