Skip to content

Instantly share code, notes, and snippets.

@raphlinus
raphlinus / neon_to_srgb.rs
Created November 15, 2024 15:48
Neon implementation of linear to sRGB transfer function
// Copyright 2024 the Color Authors
// SPDX-License-Identifier: Apache-2.0 OR MIT
#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "neon")]
#[inline(never)]
pub unsafe fn to_srgb(rgba: [f32; 4]) -> [f32; 4] {
let v = aarch64::vld1q_f32(rgba.as_ptr());
let vabs = aarch64::vabsq_f32(v);
let bias = aarch64::vdupq_n_f32(-5.35862651e-04);
@raphlinus
raphlinus / simd_reduce_test.rs
Last active October 29, 2024 21:24
Comparison of scalar and SIMD max reduction
// run with `RUSTFLAGS='-C target-cpu=native' cargo +nightly bench`
#![feature(test)]
fn main() {
let mut a = [0u32; 65536];
a[1] = 42;
println!("{}", scalar_max(&a));
println!("{}", avx2_max(&a));
}
@raphlinus
raphlinus / gist:5aca9de53f9d6b24933cb24d8a60df63
Created March 15, 2024 04:22
apparent miscompilation of flatten.wgsl
1 s_version 0x4004 4 0.01 2
2 s_inst_prefetch 0x3 4 0.01 1
3 s_getpc_b64 s[0:1] 4 0.03 5
4 s_mov_b32 s0, s2 4 0.05 9
5 s_load_dwordx4 s[4:7], s[0:1], null 4 0.01 1
6 s_load_dwordx4 s[12:15], s[0:1], 0x20 4 0.01 1
7 s_load_dwordx4 s[16:19], s[0:1], 0x40 4 0.01 1
8 v_lshl_add_u32 v3, s8, 8, v0 4 0.03 5
9 v_lshrrev_b32_e32 v0, 2, v3 4 0.01 1
10 s_waitcnt lgkmcnt(0)
@raphlinus
raphlinus / relaxed.rs
Last active November 5, 2023 16:35
LB litmus test adapted to loom crate
use std::sync::atomic::Ordering;
use loom::{
sync::{atomic::AtomicU32, Arc},
thread,
};
#[test]
fn relaxed() {
loom::model(|| {
@raphlinus
raphlinus / main.rs
Created July 4, 2023 19:00
Property testing of email scanner
// SPDX-License-Identifier: MIT
use proptest::test_runner::TestRunner;
use proptest::strategy::{Strategy, ValueTree};
fn main() {
let email_re = "^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\
(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$";
let re = regex::Regex::new(email_re).unwrap();
let mut runner = TestRunner::default();
for _ in 0..100_000_000 {
@raphlinus
raphlinus / max.hlsl
Last active June 9, 2023 16:41
MetalLibraryExplorer dump of metal-shaderconverter output
ByteAddressBuffer input;
RWByteAddressBuffer output;
groupshared uint max_value;
[numthreads(256, 1, 1)]
void main(uint index: SV_GroupIndex) {
if (index == 0) {
max_value = 0;
}
@raphlinus
raphlinus / simplify_svg.rs
Created April 13, 2023 19:18
Test running code for path simplification
// Adapted from code provided by @rachael-wang
use std::{env, fs};
fn main() {
let args: Vec<String> = env::args().collect();
let svg_contents = fs::read_to_string(&args[1]).unwrap();
let curve = kurbo::BezPath::from_svg(&svg_contents).expect("SVG parse error");
@raphlinus
raphlinus / subgroup.md
Last active March 10, 2023 23:00
Draft subgroup issue for gpuweb

Considerations for subgroups

One feature that is clearly out of scope for WebGPU 1.0 but is desired for the near future is subgroups. It is a way to move data between threads within a workgroup with less overhead and latency than workgroup shared memory, but poses more challenges for portability. While almost all modern GPU hardware supports subgroup operations, the feature poses significant compatibility challenges. In particular, while workgroup size is determined by the programmer within generous ranges (WebGPU requires a minimum maximum of 256), subgroup sizes vary by hardware and also compiler heuristics. Shaders need be written in a way that adapts to a wide range of subgroup sizes, which is quite challenging.

This issue will be written largely from the perspective of accelerating prefix sum operations (an important primitive within Vello), but there are many potential applications. One relatively recent development is cooperative matrix operations, which are supported in most newer GPU hardware a

@raphlinus
raphlinus / xcode_crash_log.txt
Created January 16, 2023 17:02
Xcode 14 crash profiling Vello
-------------------------------------
Translated Report (Full Report Below)
-------------------------------------
Process: Xcode [50519]
Path: /Applications/Xcode.app/Contents/MacOS/Xcode
Identifier: com.apple.dt.Xcode
Version: 14.2 (21534)
Build Info: IDEFrameworks-21534000000000000~49 (14C18)
App Item ID: 497799835
@raphlinus
raphlinus / barrier_test.swift
Last active November 6, 2022 17:04
Minimal metal repro of piet-gpu#199
import Foundation
import Metal
// This is translated by naga from tile_alloc.wgsl
// naga --buffer-bounds-check-policy ReadZeroSkipWrite tile_alloc.wgsl tile_alloc.metal
let metalProgram = """
// language: metal2.0
#include <metal_stdlib>
#include <simd/simd.h>