Mārtiņš Možeiko mmozeiko

Fast MiniMax Polynomial Approximations of Sine and Cosine

Unexpected Uses for the Galois Field Affine Transformation Instruction

Intel added the Galois Field instruction set (GFNI) extensions to their Sunny Cove and Tremont cores. What’s particularly interesting is that GFNI is the only new SIMD extension that came with SSE and VEX/AVX encodings (in addition to EVEX/AVX512), to allow it to be supported on all future Intel cores, including those which don’t support AVX512 (such as the Atom line, as well as Celeron/Pentium branded “big” cores).

I suspect GFNI was aimed at accelerating SM4 encryption, however, one of the instructions can be used for many other purposes. The extension includes three instructions, but of particular interest here is the Affine Transformation (GF2P8AFFINEQB), aka bit-matrix multiply, instruction.

There have been various articles which discuss out-of-band

Synchronized Output

Synchronized output is merely implementing the feature as inspired by iTerm2 synchronized output, except that it's not using the rare DCS but rather the well known SM ? and RM ?. iTerm2 has now also adopted to use the new syntax instead of using DCS.

Semantics

When rendering the screen of the terminal, the Emulator usually iterates through each visible grid cell and renders its current state. With applications updating the screen a at higher frequency this can cause tearing.

This mode attempts to mitigate that.

ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016. A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).

Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/

	static char*
	sys_clip_get(struct sys *s, Atom selection, Atom target)
	{
	assert(s);
	struct sys_x11 *x11 = s->platform;

	/* blocking wait for clipboard data */
	XEvent notify;
	XConvertSelection(x11->dpy, selection, target, selection, x11->helper, CurrentTime);
	while (!XCheckTypedWindowEvent(x11->dpy, x11->helper, SelectionNotify, &notify)) {

	struct vec3f {float x, y, z;};
	struct vec4f {float x, y, z, w;};
	struct mat44f {vec4f x, y, z, w;};

	//============================================================================
	// sphere_screen_extents
	//============================================================================
	// Calculates the exact screen extents xyzw=[left, bottom, right, top] in
	// normalized screen coordinates [-1, 1] for a sphere in view space. For
	// performance, the projection matrix (v2p) is assumed to be setup so that

	Perfect Quantization of DXT endpoints
	-------------------------------------

	One of the issues that affect the quality of most DXT compressors is the way floating point colors are rounded.

	For example, stb_dxt does:

	max16 = (unsigned short)(stb__sclamp((At1_ryy - At2_rxy)*frb+0.5f,0,31) << 11);
	max16 \|= (unsigned short)(stb__sclamp((At1_gyy - At2_gxy)*fg +0.5f,0,63) << 5);
	max16 \|= (unsigned short)(stb__sclamp((At1_byy - At2_bxy)*frb+0.5f,0,31) << 0);

	#pragma use_dxc //enable SM 6.0 features, in Unity this is only supported on version 2020.2.0a8 or later with D3D12 enabled
	#pragma kernel CountTotalsInBlock
	#pragma kernel BlockCountPostfixSum
	#pragma kernel CalculateOffsetsForEachKey
	#pragma kernel FinalSort

	uint _FirstBitToSort;
	int _NumElements;
	int _NumBlocks;
	bool _ShouldSortPayload;

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <string.h>

	#if defined(__x86_64__)
	#define BREAK asm("int3")
	#else
	#error Implement macros for your CPU.
	#endif