Skip to content

Instantly share code, notes, and snippets.

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

@porglezomp
porglezomp / Leftpad.idr
Last active August 13, 2024 22:22
Taking on Hillel's challenge to formally verify leftpad (https://twitter.com/Hillelogram/status/987432181889994759)
-- Note: There is a more complete explanation at https://github.com/hwayne/lets-prove-leftpad/tree/master/idris
import Data.Vect
-- `minus` is saturating subtraction, so this works like we want it to
eq_max : (n, k : Nat) -> maximum k n = plus (n `minus` k) k
eq_max n Z = rewrite minusZeroRight n in rewrite plusZeroRightNeutral n in Refl
eq_max Z (S _) = Refl
eq_max (S n) (S k) = rewrite sym $ plusSuccRightSucc (n `minus` k) k in rewrite eq_max n k in Refl
-- The type here says "the result is" padded to (maximum k n), and is padding plus the original
@raphlinus
raphlinus / parse.cu
Created April 23, 2018 19:31
sketch of prefix sum to do backslash unescaping in cuda
// Copyright 2018 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
open FStar.Char
open FStar.Seq
(* Helper *)
let max a b = if a > b then a else b
(* Definition *)
let leftpad (c:char) (n:nat) (s:seq char) : seq char =
let pad = max (n - length s) 0 in
append (create pad c) s
theory Leftpad
imports Main
begin
definition leftPad :: "'a ⇒ nat ⇒ 'a list ⇒ 'a list"
where "leftPad padChar targetLength s ≡ replicate (targetLength - length s) padChar @ s"
definition isPadded :: "'a ⇒ 'a list ⇒ 'a list ⇒ bool"
where "isPadded padChar unpadded padded ≡ ∃ n. set (take n padded) ⊆ { padChar } ∧ drop n padded = unpadded"
module Unique where
-- I deal with naturals instead of integers. with a bit more effort, I could
-- probably have parameterized this over any type with decidable equality.
open import Data.Nat
open import Data.Product
open import Data.List hiding (any)
open import Data.List.Any
open import Relation.Nullary
open import Relation.Binary.PropositionalEquality
@rntz
rntz / leftpad.agda
Created April 21, 2018 21:10
verified left-pad in agda
module HillelVerificationProblems where
open import Data.Nat
open import Data.Nat.Properties.Simple using (+-right-identity; +-comm)
open import Data.List
open import Data.List.All
open import Data.List.Properties
open import Relation.Binary.PropositionalEquality
-- NB. (a ∸ b) is saturating subtraction and (a ⊔ b) is max(a,b).
@seanjensengrey
seanjensengrey / octal_x86.txt
Last active December 29, 2025 08:07
x86 is an octal machine
# source:http://geocities.com/SiliconValley/heights/7052/opcode.txt
From: [email protected] (Mark Hopkins)
Newsgroups: alt.lang.asm
Subject: A Summary of the 80486 Opcodes and Instructions
(1) The 80x86 is an Octal Machine
This is a follow-up and revision of an article posted in alt.lang.asm on
7-5-92 concerning the 80x86 instruction encoding.
The only proper way to understand 80x86 coding is to realize that ALL 80x86
@nlgxzef
nlgxzef / NFSMW-PerformanceTuning.ct
Created February 23, 2018 16:47
NFSMW - Pushing the limits of performance tuning further!
<?xml version="1.0" encoding="utf-8"?>
<CheatTable CheatEngineTableVersion="24">
<CheatEntries>
<CheatEntry>
<ID>0</ID>
<Description>"Minimum Slider Values (-1 = Default)"</Description>
<VariableType>Float</VariableType>
<Address>008A9350</Address>
</CheatEntry>
<CheatEntry>
@shafik
shafik / WhatIsStrictAliasingAndWhyDoWeCare.md
Last active December 25, 2025 23:54
What is Strict Aliasing and Why do we Care?

What is the Strict Aliasing Rule and Why do we care?

(OR Type Punning, Undefined Behavior and Alignment, Oh My!)

What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th