Skip to content

Instantly share code, notes, and snippets.

View yupferris's full-sized avatar

Jake Taylor yupferris

View GitHub Profile
// Example: Opcode dispatch in a bytecode VM. Assume the opcode case dispatching is mispredict heavy,
// and that pc, ins, next_ins, next_opcase are always in registers.
#define a ((ins >> 8) & 0xFF)
#define b ((ins >> 16) & 0xFF)
#define c ((ins >> 24) & 0xFF)
// Version 1: Synchronous instruction fetch and opcode dispatch. The big bottleneck is that given how light
// the essential work is for each opcode case (e.g. something like ADD is typical), you're dominated
// by the cost of the opcode dispatch branch mispredicts. When there's a mispredict, the pipeline restarts
@pervognsen
pervognsen / shift_dfa.md
Last active November 1, 2024 23:53
Shift-based DFAs

A traditional table-based DFA implementation looks like this:

uint8_t table[NUM_STATES][256]

uint8_t run(const uint8_t *start, const uint8_t *end, uint8_t state) {
    for (const uint8_t *s = start; s != end; s++)
        state = table[state][*s];
    return state;
}
@0b5vr
0b5vr / vhs.hlsl
Last active May 26, 2023 03:58
Windows Terminal VHS Pixel Shader You Definitely Are Not Gonna Need This
/**
* (c) 2021 FMS_Cat, MIT License
* Original shader: https://www.shadertoy.com/view/MdffD7
* I dumbass don't know what it says despite it's my own shader
*/
Texture2D shaderTexture;
SamplerState samplerState;
cbuffer PixelShaderSettings {
@rygorous
rygorous / rast.c
Created March 2, 2020 01:56
Simple watertight triangle rasterizer
// ---- triangle rasterizer
#define SUBPIXEL_SHIFT 8
#define SUBPIXEL_SCALE (1 << SUBPIXEL_SHIFT)
static RADINLINE S64 det2x2(S32 a, S32 b, S32 c, S32 d)
{
S64 r = (S64) a*d - (S64) b*c;
return r >> SUBPIXEL_SHIFT;
}

Foreward

This document was originally written several years ago. At the time I was working as an execution core verification engineer at Arm. The following points are coloured heavily by working in and around the execution cores of various processors. Apply a pinch of salt; points contain varying degrees of opinion.

It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I'd likely implement the architecture to benefit from the existing tooling.

Mostly based upon the RISC-V ISA spec v2.0. Some updates have been made for v2.2

Original Foreword: Some Opinion

The RISC-V ISA has pursued minimalism to a fault. There is a large emphasis on minimizing instruction count, normalizing encoding, etc. This pursuit of minimalism has resulted in false orthogonalities (such as reusing the same instruction for branches, calls and returns) and a requirement for superfluous instructions which impacts code density both in terms of size and

68000 instructions timings

When I started to write some pure 68000 I didn't find a nice doc that would cover all the op codes I used which a nice diagram for the cycle counts so I made one. This info has been assembled / hacked up using this tool https://github.com/emoon/68k_documentation_gen with a bunch of manual work and some automation for generating the cycle tables. If you find any errors in this (I'm sure there are plenty but it has been useful for me) please contact me or even better do a PR :)

ABCD

Operation: Source10 + Destination10 + X → Destination

I was told by @mmozeiko that Address Sanitizer (ASAN) works on Windows now. I'd tried it a few years ago with no luck, so this was exciting news to hear.

It was a pretty smooth experience, but with a few gotchas I wanted to document.

First, download and run the LLVM installer for Windows: https://llvm.org/builds/

Then download and install the VS extension if you're a Visual Studio 2017 user like I am.

It's now very easy to use Clang to build your existing MSVC projects since there's a cl compatible frontend:

@rygorous
rygorous / ukkonen.rs
Created February 18, 2019 09:37
Trying to write Ukkonen's algorithm from memory in a language I don't know! Without tests! YOLO
use std::str;
// Store positions in packed (u32) form; this limits us to under 4GB of
// payload but makes the data structures a bit more compact.
struct PackedPos(u32);
impl PackedPos {
fn from(pos: usize) -> PackedPos {
assert!(pos <= std::u32::MAX as usize);
PackedPos(pos as u32)

Multi-dimensional array views for systems programmers

As C programmers, most of us think of pointer arithmetic for multi-dimensional arrays in a nested way:

The address for a 1-dimensional array is base + x. The address for a 2-dimensional array is base + x + y*x_size for row-major layout and base + y + x*y_size for column-major layout. The address for a 3-dimensional array is base + x + (y + z*y_size)*x_size for row-column-major layout. And so on.