Skip to content

Instantly share code, notes, and snippets.

View mukul54's full-sized avatar
🎯
Focusing

Mukul Ranjan mukul54

🎯
Focusing
View GitHub Profile
@mukul54
mukul54 / attetnion.md
Last active October 6, 2025 06:21
attention

Self-Attention: Token-by-Token Processing

Setup and Notation

Input Sequence:

  • We have T tokens in our sequence
  • Each token at position t is denoted as $\mathbf{x}_t$ where $t \in {1, 2, ..., T}$
  • Each token embedding has dimension $d_{model}$

$$\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3, ..., \mathbf{x}_T \quad \text{where each } \mathbf{x}_t \in \mathbb{R}^{d_{model}}$$

@mukul54
mukul54 / rope_with_examples.md
Created October 1, 2025 04:00
Rope Code LLaMA

RoPE (Rotary Position Embedding) Explained

The Mathematics Behind RoPE

Core Concept

RoPE encodes positional information by rotating embedding vectors in a way that:

  1. Preserves relative positions: The dot product between tokens depends on their relative distance
  2. Uses rotation: Each position gets rotated by an angle proportional to its position
  3. Works in pairs: Dimensions are grouped in pairs and rotated together
@mukul54
mukul54 / dcd.md
Last active September 8, 2025 05:09
DISCRETE COPULA DIFFUSION (https://openreview.net/pdf?id=FXw0okNcOb)

Discrete Copula Diffusion: Solving the Few-Step Generation Problem

Introduction

Discrete diffusion models have shown remarkable progress in generating complex data like natural language and DNA sequences. However, unlike their continuous counterparts that can produce high-quality samples in just a few denoising steps, discrete diffusion models require hundreds or even thousands of steps to perform well. A recent paper "Discrete Copula Diffusion" identifies the fundamental limitation causing this inefficiency and proposes an elegant solution.

In this blog post, we'll dive deep into understanding why discrete diffusion models struggle with few-step generation and how the proposed copula approach addresses this core limitation.

Background: How Discrete Diffusion Models Work

@mukul54
mukul54 / big-bang-claude-gr.md
Last active July 8, 2025 21:27
big-bang-claude-gr

The key is applying Einstein's field equations to a homogeneous, isotropic universe.

Starting Point: Einstein Field Equations

The foundation is Einstein's field equation:

$$R_{\mu\nu} - \frac{1}{2}Rg_{\mu\nu} + \Lambda g_{\mu\nu} = \frac{8\pi G}{c^4}T_{\mu\nu}$$

Where:

  • $R_{\mu\nu}$ is the Ricci curvature tensor
@mukul54
mukul54 / pyspark_array_function.md
Created January 14, 2023 19:29
pyspark_array_function
ARRAY FUNCTION SYNTAX ARRAY FUNCTION DESCRIPTION
array_contains(column: Column, value: Any) Check if a value presents in an array column. Return below values.true - Returns if value presents in an array.false - When valu eno presents