Skip to content

Instantly share code, notes, and snippets.

View makslevental's full-sized avatar
💩

Maksim Levental makslevental

💩
View GitHub Profile
@makslevental
makslevental / fuse-mask-reduce.mlir
Last active March 26, 2026 19:01
linalg fusions
// Attention mask + max-reduce using named linalg ops.
// 3 kernels: broadcast mask → add to scores → max-reduce over j.
// After generalize + fuse: broadcast and add are inlined into the reduction.
func.func @fuse_mask_into_max_reduce(
%scores: tensor<4x512x512xf32>,
%mask: tensor<512x512xf32>,
%neg_inf_init: tensor<4x512xf32>) -> tensor<4x512xf32> {
%init3d = tensor.empty() : tensor<4x512x512xf32>
RTLD_LOCAL, on the other hand, adds the loaded library to a search list that is specific to the current ELF object (called the local scope or scope 1). RTLD_LOCAL is also transitive, meaning that the DT_NEEDED dependencies of a library opened with RTLD_LOCAL will themselves be loaded as RTLD_LOCAL and not added to the global scope.
maksimlevental@Maksims-MacBook-Pro-2 _mlir_libs % objdump -p MLIRPythonSupport-mlir.dll | less
SizeOfHeapCommit 0000000000001000
LoaderFlags 00000000
NumberOfRvaAndSizes 00000010
; ModuleID = 'vector_add.air'
source_filename = "vector_add.metal"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64_v27-apple-macosx15.6.0"
; Function Attrs: mustprogress nounwind
define weak_odr void @mlir_vector_add(ptr addrspace(1) noundef "air-buffer-no-alias" %0, ptr addrspace(1) noundef "air-buffer-no-alias" %1, ptr addrspace(1) noundef "air-buffer-no-alias" %2, <3 x i32> noundef %3) local_unnamed_addr #0 {
%5 = extractelement <3 x i32> %3, i64 0
%6 = zext i32 %5 to i64
%7 = getelementptr inbounds half, ptr addrspace(1) %0, i64 %6
mlir_vector_add
for i in {1..222} ; do
curl 'https://lab.llvm.org/buildbot/api/v2/forceschedulers/force-build-scheduler' \
-H 'Accept: application/json, text/plain, */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-b 'TWISTED_SESSION=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2luZm8iOnsiZnVsbF9uYW1lIjoiTWFrc2ltIExldmVudGFsIiwiZW1haWwiOiJtYWtzaW0ubGV2ZW50YWxAZ21haWwuY29tIiwidXNlcm5hbWUiOiJtYWtzbGV2ZW50YWwiLCJncm91cHMiOlsibGx2bSIsImxsdm0vTExWTSBDb21taXR0ZXJzIiwibGx2bS9ldWRzbCBBZG1pbnMiLCJsbHZtL2V1ZHNsIGNvbW1pdHRlcnMiLCJsbHZtL2V1ZHNsLWFkbWlucyIsImxsdm0vZXVkc2wtY29tbWl0dGVycyIsImxsdm0vaXNzdWUtc3Vic2NyaWJlcnMtbWxpci1weXRob24iLCJsbHZtL2lzc3VlLXN1YnNjcmliZXJzLW1saXI6cHl0aG9uIiwibGx2bS9saWdodGhvdXNlIGNvbW1pdHRlcnMiLCJsbHZtL2xpZ2h0aG91c2UtY29tbWl0dGVycyIsImxsdm0vbGx2bS1jb21taXR0ZXJzIl19LCJleHAiOjE3NjExMTA0MjN9.9kzWVd8PEVKx4lrKYmd4td4UDCenJzNV9wRFfvhj9gM; _ga=GA1.1.1617727655.1755182138; _ga_SBS5VNKHC1=GS2.1.s1755182138$o1$g1$t1755182143$j55$l0$h
from collections.abc import Sequence
import mlir
from . import (
ir as ir,
passmanager as passmanager,
rewrite as rewrite
)
from collections.abc import Callable, Sequence
import enum
from typing import Any, overload
import WalkOrder
import mlir
import typing_extensions
from _mlir_libs import Context as Context, MLIRError as MLIRError
diff --git a/mlir/lib/Bindings/Python/Pass.cpp b/mlir/lib/Bindings/Python/Pass.cpp
index 6ee85e8a3149..88557706bd04 100644
--- a/mlir/lib/Bindings/Python/Pass.cpp
+++ b/mlir/lib/Bindings/Python/Pass.cpp
@@ -59,6 +59,12 @@ void mlir::python::populatePassManagerSubmodule(nb::module_ &m) {
//----------------------------------------------------------------------------
// Mapping of the top-level PassManager
//----------------------------------------------------------------------------
+
+ nb::class_<MlirExternalPass>(m, "ExternalPass");
@makslevental
makslevental / farkas.md
Last active July 24, 2025 03:23
farkas lemma

First I'll say that whoever is telling you you need to understand farkas lemma to understand loop carried dependencies is completely fucking with you. Even the people that should understand it probably don't (I can think of 5 people in my professional circle that are "polyhedral" people and I guarantee they don't remember/don't care about farkas).

Secondly, farkas clicked for me after thinking about barrier functions and duality. Bear with me because I haven't thought about this in a while:

Ax = b 

is a linear system of constraints and you're looking for a solution right? Constrained optimization is "hard" because in principle your search algo should always satisfy the constraints.

What's easier is unconstrained optimization i.e. gradient descent. How can you (sort of) turn Ax = b into an unconstrained optimization problem? Write down a penalty function that "penalizes" for violating each of the constraints and then just use gradient descent.