Insu Jeon insujeon

Technical Overview and Explanation of "Scalable MatMul-free Language Modeling"

Introduction

This paper presents a novel approach to large language models (LLMs) that eliminates matrix multiplication (MatMul) operations, which are typically the most computationally expensive part of such models. By doing so, the authors aim to significantly reduce memory usage and improve computational efficiency, enabling the models to scale up to billions of parameters while maintaining performance comparable to state-of-the-art Transformers.

Key Contributions

MatMul-Free Dense Layers: The core innovation lies in replacing MatMul operations in dense layers with addition operations using ternary weights. These ternary weights take values from {-1, 0, +1}, which allows matrix multiplications to be transformed into simple additions and subtractions.

(DEFUNCT - NO LONGER WORKS) YouTube Video Summarizer (yt_summarize.py)

This gist contains a Python script that generates a transcript or summary of a YouTube video. It fetches video information, transcribes the audio using the Whisper ASR model, and generates a summary using the OpenAI language model.

Features

Fetch YouTube video information (title, description, channel title, etc.)
Transcribe video audio
Generate a summary of the video transcript
Save output as a markdown file

How To Install OpenSSL 1.1.1 on CentOS 7

This tutorial goes through how to install openssl 1.1.1 on CentOS 7, since the yum repo only installs up to openssl 1.0.

Requirements

Upgrade the system

yum -y update

	import numpy as np
	import torch
	import torch.nn as nn
	from functorch import vmap, jacrev, make_functional_with_buffers

	batch_size = 2
	in_channels = 5
	out_channels = 20
	feature_shape = 8
	feature = torch.rand(batch_size, in_channels, feature_shape, feature_shape)

	"""
	Laplace approximation of a Beta distribution.
	"""
	import matplotlib.pyplot as plt
	import torch

	x = torch.linspace(0, 1, 200)
	p = torch.distributions.Beta(2, 5)

	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	from torch.nn.parameter import Parameter

	def debug(debug_open, x, layername):
	if debug_open:
	print x.size(), 'after', layername

	class PVANet(nn.Module):

	import numpy as np
	import scipy
	import scipy.ndimage
	from scipy.ndimage.filters import gaussian_filter
	from scipy.ndimage.interpolation import map_coordinates
	import collections
	from PIL import Image
	import numbers

	__author__ = "Wei OUYANG"

	## Weight norm is now added to pytorch as a pre-hook, so use that instead :)

	import torch
	import torch.nn as nn
	from torch.nn import Parameter
	from functools import wraps

	class WeightNorm(nn.Module):
	append_g = '_g'
	append_v = '_v'

	#!/usr/bin/env python
	"""
	A quick, partial implementation of ENet (https://arxiv.org/abs/1606.02147) using PyTorch.
	The original Torch ENet implementation can process a 480x360 image in ~12 ms (on a P2 AWS
	instance). TensorFlow takes ~35 ms. The PyTorch implementation takes ~25 ms, an improvement
	over TensorFlow, but worse than the original Torch.
	"""

	from __future__ import absolute_import