Skip to content

Instantly share code, notes, and snippets.

@cloneofsimo
Created August 19, 2024 19:04
Show Gist options
  • Save cloneofsimo/db34c8f8bc50129a28ef96edde171a8b to your computer and use it in GitHub Desktop.
Save cloneofsimo/db34c8f8bc50129a28ef96edde171a8b to your computer and use it in GitHub Desktop.

Variant of AM-GM for Minimization

When dealing with functions of the form $f(x) = x^a + \frac{1}{x^b}$, a variant of the AM-GM inequality can be used to find the minimum. Specifically, if you have:

$$ f(x) = c_1 \cdot x^a + c_2 \cdot \frac{1}{x^b} $$

The minimum occurs at:

$$ x_{\text{min}} = \left(\frac{c_2 \cdot b}{c_1 \cdot a}\right)^{\frac{1}{a+b}} $$

And the minimum value is:

$$ f_{\text{min}} = c_1 \left(\frac{c_2 \cdot b}{c_1 \cdot a}\right)^{\frac{a}{a+b}} + c_2 \left(\frac{c_1 \cdot a}{c_2 \cdot b}\right)^{\frac{b}{a+b}} $$

Main Problem: Symbolic Minimization

Consider the problem where we want to minimize:

Constraint:

$$ x^X \cdot y^Y = Z $$

Objective Function:

$$ f(x, y) = \frac{A}{x^a} + \frac{B}{y^b} $$

To minimize $f(x, y)$, express $y$ in terms of $x$ using the constraint and substitute it into the objective function:

$$ f(x) = A \cdot x^{-a} + \frac{B}{Z^{\frac{b}{Y}}} \cdot x^{\frac{Xb}{Y}} $$

This function can then be minimized using the variant of the AM-GM inequality, yielding:

$$ x_{\text{min}} = \left(\frac{A \cdot Y \cdot Z^{\frac{b}{Y}} \cdot a}{B \cdot X \cdot b}\right)^{\frac{Y}{Xb + Ya}} = C_1 Z^{\frac{b}{Xb + Ya}} $$

$$ y_{\text{min}} = C_2 Z^{\frac{a}{Xb + Ya}} $$

Ratio of these two values would look like:

$$ x/y = C_3 Z^{\frac{b - a}{Xb + Ya}} $$

Strassen Chinchilla

Given:

x : Number of parameters

y : Number of dataset.

If we use Strassen matmul, forward / backward passes become $O(N^{0.68} D)$ instead of $6ND$

(let $N = 12 L h^2$, $D = h D/h$, forward with batch size of $h$ shrinks from $h^3$ to $h^{2.371}$).

  • Constraint: $x^{0.68} \cdot y = \text{constant}$
  • Objective: $\frac{A}{x^{0.34}} + \frac{B}{y^{0.28}}$ (Taken from chinchilla huber regression fit)

Solution:

Using the derived formula, substitute $X = 0.68$, $Y = 1$, $a = 0.34$, and $b = 0.28$

$$ N_{opt} = C^{0.54}, D_{opt} = C^{0.66} $$

This means, in a sense, data should be much larger than parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment