Self-Attention: Token-by-Token Processing

Setup and Notation

Input Sequence:

We have T tokens in our sequence
Each token at position t is denoted as $\mathbf{x}_t$ where $t \in {1, 2, ..., T}$
Each token embedding has dimension $d_{model}$

$$\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3, ..., \mathbf{x}_T \quad \text{where each } \mathbf{x}_t \in \mathbb{R}^{d_{model}}$$

mukul54 / rope_with_examples.md

Created October 1, 2025 04:00

Rope Code LLaMA

RoPE (Rotary Position Embedding) Explained

The Mathematics Behind RoPE

Core Concept

RoPE encodes positional information by rotating embedding vectors in a way that:

Preserves relative positions: The dot product between tokens depends on their relative distance
Uses rotation: Each position gets rotated by an angle proportional to its position
Works in pairs: Dimensions are grouped in pairs and rotated together

mukul54 / dcd.md

Last active September 8, 2025 05:09

DISCRETE COPULA DIFFUSION (https://openreview.net/pdf?id=FXw0okNcOb)

Discrete Copula Diffusion: Solving the Few-Step Generation Problem

Introduction

Discrete diffusion models have shown remarkable progress in generating complex data like natural language and DNA sequences. However, unlike their continuous counterparts that can produce high-quality samples in just a few denoising steps, discrete diffusion models require hundreds or even thousands of steps to perform well. A recent paper "Discrete Copula Diffusion" identifies the fundamental limitation causing this inefficiency and proposes an elegant solution.

In this blog post, we'll dive deep into understanding why discrete diffusion models struggle with few-step generation and how the proposed copula approach addresses this core limitation.

Background: How Discrete Diffusion Models Work

mukul54 / big-bang-claude-gr.md

Last active July 8, 2025 21:27

big-bang-claude-gr

The key is applying Einstein's field equations to a homogeneous, isotropic universe.

Starting Point: Einstein Field Equations

The foundation is Einstein's field equation:

$$R_{\mu\nu} - \frac{1}{2}Rg_{\mu\nu} + \Lambda g_{\mu\nu} = \frac{8\pi G}{c^4}T_{\mu\nu}$$

Where:

$R_{\mu\nu}$ is the Ricci curvature tensor

mukul54 / pyspark_array_function.md

Created January 14, 2023 19:29

pyspark_array_function

ARRAY FUNCTION SYNTAX	ARRAY FUNCTION DESCRIPTION
array_contains(column: Column, value: Any)	Check if a value presents in an array column. Return below values.true - Returns if value presents in an array.false - When valu eno presents

Mukul Ranjan mukul54