Skip to content

Instantly share code, notes, and snippets.

backend: metal, device: Intel(R) Iris(TM) Plus Graphics 640
metal-threadgroup-Intel(R) Iris(TM) Plus Graphics 640
kernel type: threadgroup
cpu_execs: 2, gpu_execs: 5001
transpose-threadgroup-WGS=(1,32) kernel already compiled...
num bms: 4096, num dispatch groups: 4096
GPU results verified!
task name:metal-threadgroup-WGS=(32, 32)
TG size: 32
timestamp stats (N = 2): 0.00 +/- 0.00 ms
@raphlinus
raphlinus / timing_results_hybrid_shuffle.txt
Created February 24, 2020 03:22
mac results on transpose-timing-tests (git hash 781dcf54fc8f32fa2acf54c7a0261defe09ef1be)
compiling kernel transpose-hybrid-shuffle-WGS=(32,1)...
num bms: 4096, num dispatch groups: 4096
GPU results verified!
task name:Vk-HybridShuffle-TG=32
device: Intel(R) Iris(TM) Plus Graphics 640
num BMs: 4096, TG size: 32
CPU loops: 101, GPU loops: 1001
timestamp stats (N = 101): 0.00 +/- 0.00 ms
instant stats (N = 101): 108.47 +/- 8.75 ms
@raphlinus
raphlinus / bitmagic.py
Created February 12, 2020 04:19
A Python scratch file used in support of working out piet-gpu kernels
def ctz(x):
if x == 0: return 32
r = 0
while (x % 2) == 0:
r += 1
x >>= 1
return r
def clz(x):
for k in range(31, -1, -1):
@raphlinus
raphlinus / kernel-2.md
Last active February 11, 2020 23:08
Description of fancy subgroup piet-metal kernel 2

Kernel 2 processes all the segments in the fill and stroke items. Here we'll concentrate on fill (stroke is similar).

Its input is: a list of fill items for this tilegroup, from kernel 1. Also access to the scene, for the items, and for the lists of points.

Its output is: for each item, a background fill and a list of segments. (there's potential complexity that the segments can be "fill" and "fill edge").

This note refers to the piet-metal source extensively. For the most part, it does the PietItem_Fill case (lines 248..365).

Some simplifications: we'll consider the item list a vec, with len and index operations. In practice, it is likely to be fragmented, to make dynamic allocation easier for kernel 1. We'll also write the code for output in pseudocode (it will have to do similar dynamic alloc tricks).

@raphlinus
raphlinus / piet-gpu-simple-kernel-1.metal
Created February 10, 2020 18:22
Non-subgroup version of graph traversal
struct StackElement {
PietGroupRef group;
uint index;
float2 offset; // Maybe pack as short2?
}
kernel1(Buf scene, PietGroupRef root) {
StackElement stack[MAX_STACK];
uint stack_ix = 0;
uint group = root;
@raphlinus
raphlinus / bitmap_transpose.metal
Created February 8, 2020 21:11
32x32 matrix transpose in metal using subgroups
inline uint shuffle_round(uint a, uint m, ushort s) {
uint b = simd_shuffle_xor(a, s);
uint c;
if ((tix & s) == 0) {
c = b << s;
} else {
m = ~m;
c = b >> s;
}
return (a & m) | (c & ~m);
@raphlinus
raphlinus / piet-gpu-fancy-kernel-1.metal
Last active February 8, 2020 20:19
Pseudocode of the fancy (subgroup) version of piet-gpu's kernel 1 (simplified)
struct StackElement {
PietGroupRef group;
uint index;
float2 offset; // Maybe pack as short2?
}
kernel1(Buf scene, PietGroupRef root) {
StackElement stack[MAX_STACK];
uint stack_ix = 0;
uint tos_group = root;
@raphlinus
raphlinus / gen.hlsl
Last active February 1, 2020 08:36
output of current piet-metal-derive on piet-dx12 scene
inline uint extract_8bit_value(uint bit_shift, uint package) {
uint mask = 255;
uint result = (package >> bit_shift) & mask;
return result;
}
inline uint extract_16bit_value(uint bit_shift, uint package) {
uint mask = 65535;
uint result = (package >> bit_shift) & mask;
@raphlinus
raphlinus / kubelka.rs
Created January 22, 2020 18:38
Code for testing Kubelka-Munk based compositing
use std::path::Path;
use std::fs::File;
use std::io::BufWriter;
#[derive(Clone, Copy)]
struct LinearRGB {
r: f32,
g: f32,
b: f32,
}
@raphlinus
raphlinus / druidwinter2020.md
Created December 29, 2019 17:26
Winter status update & 0.5 Roadmap

Winter status update & 0.5 Roadmap

Goals & Non-goals

Development of druid is currently driven by the needs of [Runebender], a font editor, and this will continue to be true for the scope of this roadmap. Runebender is a creative desktop application, supporting Windows, macOS, and Linux (via Gtk).

A major goal for Runebender, and thus druid, is to offer a polished user experience. There are many factors to this goal, including performance, a rich palette of interactions (thus a widget library to support them), and playing well with the native platform.

This last point deserves more explanation. The intent of druid is not that you can write a single program that will magically look and feel native on all supported platforms. It's questionable whether such a thing can be done, and chasing it leads to a "lowest common denominator" approach. Rather, the goal of druid is to make it possible to create an app which respects platform conventions and expectations around things like window management, menus