This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#define STB_IMAGE_WRITE_IMPLEMENTATION | |
#include "stb_image_write.h" | |
#define WIDTH_IN_BLOCKS 29 | |
#define HEIGHT_IN_BLOCKS 28 | |
#define PADDING 4 | |
#define BLOCK_WIDTH (4 * 4) | |
#define BLOCK_HEIGHT (4 * 4) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
global _time_load | |
global _cache_flush | |
global _run_attempt | |
extern _bools | |
extern _values | |
extern _pointers | |
section .text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
N_BITS = 8 | |
MASK = (1 << N_BITS) - 1 | |
class Ternary: | |
def __init__(self, ones, unknowns): | |
self.ones = ones & MASK | |
self.unknowns = unknowns & MASK | |
assert (self.ones & self.unknowns) == 0, (bin(self.ones), bin(self.unknowns)) | |
def __add__(self, other): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX | |
# | |
# WIP research. (This was edited to add more info after someone posted it to | |
# Hacker News. Click "Revisions" to see full changes.) | |
# | |
# Copyright (c) 2020 dougallj | |
# Based on Python port of VMX intrinsics plugin: | |
# Copyright (c) 2019 w4kfu - Synacktiv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Raw data. These were dumped from iPhones/iPads using wall-timers, not | |
perf-counters. They contain some likely issues and inconsistencies that | |
haven't been fully investigated. Mostly correct, but it's worth | |
double-checking anything odd. (For example, "TBL (two register table)" | |
can have better throughput than is listed sometimes, as can some other | |
three-operand SIMD things iirc.) | |
The goal is to find the fastest rate at which an instruction can run. If | |
there are multiple rows with the same label, the "correct" value is the | |
minimum. For example: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Usage: evaluate a ternary bitwise function with the values a=0xf0, b=0xcc, c=0xaa. | |
On Intel you can pass the result directly to VPTERNLOGD. On A64, look up the value | |
in the following tables to find a short, equivalent sequence of operations. | |
Entries selected for throughput, not latency (though generally they seem to be | |
optimal for both). | |
I've only used a couple of entries and found them to be correct. Sorry if there are | |
errors. Note that SVE changed the operand order to bsl (why???), so that's svbsl. | |
Generally names are a mix between the opcodes and what I found readable (mostly |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Usage: evaluate a ternary bitwise function with the values a=0xf0, b=0xcc, c=0xaa. | |
On AVX-512 you can pass the result directly to VPTERNLOGD. On other platforms, | |
look up the value in the following tables to find a short, equivalent sequence of | |
operations. | |
For A64/SVE/Neon see https://gist.github.com/dougallj/10c3ffdbd07229db2cc8b0430d7ccd39 | |
The tables here are: | |
* agx: "not" and all binary operations (as used in Apple GPUs, but possibly useful elsewhere): |