Simo Ryu cloneofsimo

def plot_lr_final_loss_batchsize(file_path):
    # Load the data
    data = pd.read_csv(file_path)
    
    # Extract columns that match 'val_loss/val_loss_'
    val_loss_columns = [col for col in data.columns if col.startswith('val_loss/val_loss_')]
    
    # Sort val_loss_columns by K (numeric value after 'val_loss/val_loss_') in increasing order

Summarization:

As a professional summarizer, create a concise and comprehensive summary of the provided text, be it an article, post, conversation, or passage, while adhering to these guidelines:

Craft a summary that is detailed, thorough, in-depth, and complex, while maintaining clarity and conciseness.

Incorporate main ideas and essential information, eliminating extraneous language and focusing on critical aspects.

Rely strictly on the provided text, without including external information.

Variant of AM-GM for Minimization

When dealing with functions of the form $f(x) = x^a + \frac{1}{x^b}$, a variant of the AM-GM inequality can be used to find the minimum. Specifically, if you have:

$$ f(x) = c_1 \cdot x^a + c_2 \cdot \frac{1}{x^b} $$

The minimum occurs at:

Polynomial Maps from $S^n$ to $S^n$

Motivated by https://x.com/gabrielpeyre/status/1837156819799577034

	# Suppose you have neural network that
	# x_l = a_l * W_l x_{l-1}, W_l_{i,j} ~ N(0, b_l^2), Learning rate of W_l := c_l,
	# If you are using adam, you can
	# a_l <- a_l * A , b_l <- b_l / A, c_l <- c_l / A
	# and it will have exactly identical training dynamics as before.
	# This is known as ABC (ABCD) redundancy. For more general case: https://arxiv.org/abs/2308.01814
	# Let me show you what I mean:


	import torch

	import os
	import torch
	import json
	from PIL import Image
	from torch.utils.data import Dataset, DataLoader
	from diffusers.models import AutoencoderKL
	from streaming import MDSWriter

	import logging
	import time

	import os
	import torch
	import json
	from PIL import Image
	from torch.utils.data import Dataset
	from diffusers.models import AutoencoderKL
	from streaming import MDSWriter

	import logging
	import time

	import torch
	import triton
	import triton.language as tl
	from triton.language.extra import libdevice

	@triton.jit
	def fractal_kernel(
	zr_ptr, zi_ptr, cr_ptr, ci_ptr, output_ptr,
	alpha_ptr, beta_ptr, poly0_ptr, poly1_ptr, poly2_ptr, poly3_ptr, p_ptr, R, max_iter,
	H, W,

	import torch
	import time

	torch.backends.cuda.matmul.allow_tf32 = True
	torch.backends.cudnn.allow_tf32 = True
	torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = False

	@torch.no_grad()
	def benchmark_gemm(m, k, n, dtype=torch.bfloat16, allow_bf16_reduce=True):
	torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = allow_bf16_reduce

	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	from torchvision import datasets, transforms
	import numpy as np
	import math


	def compute_activation_std(model, dataset, device='cpu', batch_size=32, num_workers=0, layer_names=None):
	activations = {}

Simo Ryu cloneofsimo

Variant of AM-GM for Minimization

Polynomial Maps from $S^n$ to $S^n$

Understanding Polynomial Maps on Spheres