Variant of AM-GM for Minimization
When dealing with functions of the form $f(x) = x^a + \frac{1}{x^b}$ , a variant of the AM-GM inequality can be used to find the minimum. Specifically, if you have:
$$
f(x) = c_1 \cdot x^a + c_2 \cdot \frac{1}{x^b}
$$
The minimum occurs at:
$$
x_{\text{min}} = \left(\frac{c_2 \cdot b}{c_1 \cdot a}\right)^{\frac{1}{a+b}}
$$
And the minimum value is:
$$
f_{\text{min}} = c_1 \left(\frac{c_2 \cdot b}{c_1 \cdot a}\right)^{\frac{a}{a+b}} + c_2 \left(\frac{c_1 \cdot a}{c_2 \cdot b}\right)^{\frac{b}{a+b}}
$$
Main Problem: Symbolic Minimization
Consider the problem where we want to minimize:
$$
x^X \cdot y^Y = Z
$$
$$
f(x, y) = \frac{A}{x^a} + \frac{B}{y^b}
$$
To minimize $f(x, y)$ , express $y$ in terms of $x$ using the constraint and substitute it into the objective function:
$$
f(x) = A \cdot x^{-a} + \frac{B}{Z^{\frac{b}{Y}}} \cdot x^{\frac{Xb}{Y}}
$$
This function can then be minimized using the variant of the AM-GM inequality, yielding:
$$
x_{\text{min}} = \left(\frac{A \cdot Y \cdot Z^{\frac{b}{Y}} \cdot a}{B \cdot X \cdot b}\right)^{\frac{Y}{Xb + Ya}} = C_1 Z^{\frac{b}{Xb + Ya}}
$$
$$
y_{\text{min}} = C_2 Z^{\frac{a}{Xb + Ya}}
$$
Ratio of these two values would look like:
$$
x/y = C_3 Z^{\frac{b - a}{Xb + Ya}}
$$
x : Number of parameters
y : Number of dataset.
If we use Strassen matmul, forward / backward passes become $O(N^{0.68} D)$ instead of $6ND$
(let $N = 12 L h^2$ , $D = h D/h$ , forward with batch size of $h$ shrinks from $h^3$ to $h^{2.371}$ ).
Constraint: $x^{0.68} \cdot y = \text{constant}$
Objective: $\frac{A}{x^{0.34}} + \frac{B}{y^{0.28}}$ (Taken from chinchilla huber regression fit)
Using the derived formula, substitute $X = 0.68$ , $Y = 1$ , $a = 0.34$ , and $b = 0.28$
$$
N_{opt} = C^{0.54}, D_{opt} = C^{0.66}
$$
This means, in a sense, data should be much larger than parameters.