Yes, I have already evaluated the visuals.
To generate the summaries and answers I provided previously, I utilized the text extraction from the slides you uploaded in the video stream. The visuals were critical because they contained the mathematical definitions (e.g., the precise definition of "Ladder Decomposition") and the graphs (e.g., the visual proof of how
Recommendation for Rebuilding the Page: If you are rebuilding the page, you should absolutely feature specific visuals alongside the text. A text-only summary of this specific talk would fail to convey the core intuition.
Which visuals to include:
- The Function Dilation Plot (Slide 14/15): The graph showing the red box zooming in on the blue curve. This is the intuitive "hook" of the entire theory.
-
The Ladder Decomposition Definition (Slide 16): The mathematical notation showing
$T = T_d \circ \dots \circ T_1$ . -
The Hierarchical Learning Model (Slide 19): The equation showing the residual structure
$h_k(x) = h_{k-1}(x) + f(x)$ .
Title: Hierarchical Learning: An Entropy-Based Approach to Multiscale Data and Smooth Target Functions Word Count: ~850 words
This presentation, delivered by Amir R. Asadi (University of Cambridge) at the Banff International Research Station, proposes a novel theoretical framework for supervised learning that addresses the limitations of worst-case analysis (e.g., uniform convergence). By leveraging the inherent multiscale structure of real-world data distributions and the smoothness of target functions, Asadi introduces a hierarchical, residual learning architecture. The model theoretically justifies "curriculum learning"—processing simple examples with shallow networks and reserving deep computation for complex, high-magnitude inputs—thereby offering bounds that are statistically stronger than uniform convergence and computationally efficient (logarithmic inference depth).
The talk begins by addressing the "No Free Lunch" theorem in statistical learning, emphasizing that training data alone provides incomplete information about a target function
Asadi posits that two specific priors are ubiquitous in physical and biological datasets but underutilized in learning theory:
-
Multiscale Data Domains: Empirical distributions often follow power laws (scale invariance). The input domain is modeled as a ball
$\mathcal{X} = {x \in \mathbb{R}^m : |x| \leq R}$ , with a probability density$q(x)$ that scales according to$q(x/\gamma) = \gamma^\alpha q(x)$ . -
Target Smoothness: The target function
$T: \mathbb{R}^m \to \mathbb{R}^m$ is assumed to be a diffeomorphism (differentiable, smooth, invertible, with a Lipschitz continuous inverse).
The mathematical core of the proposal is the concept of Function Dilation. Asadi observes that for any smooth function
Mathematically, the dilated function is defined as
This observation leads to the Ladder Decomposition. The target function
The theoretical decomposition directly motivates a Residual Neural Network (ResNet) architecture. Since each layer approximates a near-identity mapping, the hypothesis space can be restricted to functions of the form:
Because the data distribution is multiscale, not all inputs require the full depth of the network.
- Inputs with small norms (essentially linear near the origin) are processed by early layers.
- Inputs with large norms (high complexity) traverse the full depth.
- The theoretical derivation suggests that the required network depth for an instance
$x$ is proportional to$\log|x|$ .
To provide rigorous statistical guarantees, the research employs a Gibbs Variational Principle. The parameters of the model are not essentially point-estimates but are sampled from Gibbs distributions defined by the loss at each scale.
Using the chain rule for Kullback-Leibler (KL) divergence, Asadi derives a bound on the Chained Risk. The theorem states that the expected loss of the hierarchical model is bounded by the sum of entropic complexities at each scale.
The analysis further demonstrates that utilizing the scale-invariant property of the input distribution allows for tighter bounds than standard uniform convergence, particularly when the scale parameter
Asadi’s work offers a formal mathematical justification for several empirical phenomena in Deep Learning:
- Curriculum Learning: The model naturally learns "easy" (small scale) features first and progressively tackles "hard" (large scale) features, mirroring human learning.
-
Efficiency: The logarithmic depth dependence (
$\log|x|$ ) suggests massive computational savings are possible by implementing early-exit strategies in inference for real-world, heavy-tailed datasets. - Constructive Deep Learning: Rather than treating deep networks as black boxes, this framework views them as a discretized flow of diffeomorphisms, where depth is a necessary tool to construct complex functions from simple, near-identity building blocks.
This research bridges the gap between approximation theory (wavelets/multiresolution analysis) and statistical learning theory, providing a pathway toward more interpretable and efficient deep learning models.
FUNDING OPPORTUNITIES
TIER 1: Best Fit for Your Project
Sloan Foundation - Technology Program
Sloan Foundation - Open Source in Science
NSF CSSI (Cyberinfrastructure for Sustained Scientific Innovation)
TIER 2: Strong Alignment
Chan Zuckerberg Initiative - Essential Open Source Software for Science (EOSS)
Schmidt Futures/Schmidt Sciences
Open Philanthropy - Scientific Research
TIER 3: Worth Exploring
Arcadia Fund - Open Access Program
Simons Foundation
Moore Foundation - Data-Driven Discovery
Wellcome Trust (partner with CZI on EOSS)
Start here: Sloan (Open Source in Science) + NSF CSSI + CZI EOSS