Skip to content

Instantly share code, notes, and snippets.

View camel-cdr's full-sized avatar

Camel Coder camel-cdr

  • 13:10 (UTC +02:00)
View GitHub Profile
@camel-cdr
camel-cdr / rv-vs-arm-chibicc.md
Created March 4, 2025 21:43
RVA23 vs ARMv9 a Small Experiment

RVA23 vs ARMv9 a Small Experiment

I was curious to see how RISC-V and ARM compare in terms of dynamic instruction count and code density, so I devised a small experiment to compare the ISAs.

As a test codebase, I choose the chibicc C compiler, because it's a medium size project and is quite easy to compile. To benchmark chibicc I just used it to compile itself, which should be a quite realistic workload to simulate a complex non-regular application. I merged all files into one and did some minor modifications, the code can be found at: https://godbolt.org/z/xr3nEW8Wf

You may notice that I added unoptimized scalar implementations of the mem* and str* functions from musl-libc. This is because I decided to not include SIMD code in this experiment, in an effort to remove more unknown variables and focus on comparing the base ISAs.

@camel-cdr
camel-cdr / dist_circle.h
Last active February 1, 2025 21:06
Using the Ziggurat Method for Sampling Random Coordinates From a Unit Circle
/*
* Using the Ziggurat Method for Sampling Random Coordinates From a Unit Circle
* by Olaf Bernstein <[email protected]>.
* Distributed under the MIT license, see license at the end of the file.
* New versions available at https://github.com/camel-cdr/cauldron
*
* Table of contents ===========================================================
*
* 1. Introduction
* 1.1 API Overview
@camel-cdr
camel-cdr / rvv-gap.md
Last active March 23, 2025 19:58
RISC-V Vector Extension for Integer Workloads: An Informal Gap Analysis

RISC-V Vector Extension for Integer Workloads: An Informal Gap Analysis

Note: To verify my RVI membership and idenity on this otherwise semi anonymous account: I'm Olaf Bernstein, you should be able to view my sig-vector profile, if you are a member of the vector SIG.

The goal of this document is to explore gaps in the current RISC-V Vector extensions (standard V, Zvbb, Zvbc, Zvkg, Zvkn, Zvks), and suggest instructions to fill these gaps. My focus lies on application class processors, with the expectation that suggested instructions would be suitable to become mandatory or optional instructions in future profiles.

I'll assume you are already familiar with RVV, if not, here is a great introduction and here the latest RISC-V ISA manual.

@camel-cdr
camel-cdr / README.md
Created May 21, 2024 16:24
RISC-V benchmark: spilling GPRs to different locations

cycles for 128 iterations of a spilling function (see complex_reduction):

                  XiangShan            XuanTie C908           SpacemiT X60
             5 spills | 14 spills | 5 spills | 14 spills | 5 spills | 14 spills
stack:           2309 |      3439 |     6898 |     18220 |     6693 |     17734
fp:              3193 |      7037 |     8483 |     32325 |     8248 |     31434
rvv_best:        3210 |      7095 |     8459 |     32343 |     8250 |     31448
rvv_zvl128b:      N/A |      7837 |     9532 |     36685 |     9290 |     35550
rvv_worst_merge: 4572 |     23013 |    12042 |     50894 |    11722 |     49232
rvv_worst_slide: N/A | 36385 | 12975 | 55166 | 12379 | 53113
@camel-cdr
camel-cdr / README.md
Last active May 28, 2024 22:41
Implement LMUL=8 vcompress.vm using existing LMUL=1 RVV primitives

The implementation complexity of vcompress.vm for large vector length and higher LMUL has been somewhat debated.

Existing implementations exhibit very poor scaling when dealing with larger operands:

VLEN e8m1 e8m2 e8m4 e8m8
c906 128 3 10 32 136
c908 128 3 10 32 139
c920 128 0.5 2.4 5.4 20.0
X60 256 3 10 32 139
@camel-cdr
camel-cdr / Dockerfile
Created August 27, 2023 08:10
Simulating tenstorent ocelot (now bobcat?) rvv 1.0 core based on SonicBOOM
FROM continuumio/miniconda3
RUN apt-get update \
&& apt-get install -y build-essential wget git unzip python3 sudo file python3-vcstools libboost-dev vim cpio binutils \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN conda install conda-lock=1.4
RUN git clone https://github.com/tenstorrent/chipyard/
@camel-cdr
camel-cdr / rvv-shishua.S
Last active February 21, 2025 02:21
rvv shishua
.global shishua_rvv # void shishua_rvv (uint64_t state[4], void *dest, size_t n)
shishua_rvv:
# load state (can easily be expanded to state[8] or state[16])
vsetvli t6, x0, e64, m2, ta, ma
ld a4, 0(a0)
vmv.v.x v0, a4
ld a4, 8(a0)
vmv.v.x v4, a4
ld a4, 16(a0)
vmv.v.x v8, a4
@camel-cdr
camel-cdr / rvv-rollback.S
Last active July 28, 2023 17:04
Assembly macros to aid in writing rvv 1.0 and rvv 0.7 compatible code
# rvv-rollback.S -- A minimal benchmarking library
# Olaf Bernstein <[email protected]>
# Distributed under the MIT license, see license at the end of the file.
# New versions available at https://gist.github.com/camel-cdr/cfd9ba2b8754b521edf4892fe19c7031
# Conversions taken from https://github.com/RISCVtestbed/rvv-rollback
.macro vle32.v a:vararg
vlw.v \a
.endm
.macro vle16.v a:vararg
@camel-cdr
camel-cdr / detect-macchanger.c
Last active January 8, 2023 19:27
detects if a MAC address was generated using macchanger --random
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#ifdef _OPENMP
#include <omp.h>
#endif
/*
* macchanger --random uses the following code to generate random MAC addresses:
@camel-cdr
camel-cdr / ppmp-bootstrap.c
Last active March 6, 2022 22:55
C preprocessor continuation machine with some helper macros
#define XXX_CM_UP_0(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_1 (XXX_CM_PASS_DN_0 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_1(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_2 (XXX_CM_PASS_DN_1 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_2(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_3 (XXX_CM_PASS_DN_2 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_3(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_4 (XXX_CM_PASS_DN_3 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_4(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_5 (XXX_CM_PASS_DN_4 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_5(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_6 (XXX_CM_PASS_DN_5 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_6(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_7 (XXX_CM_PASS_DN_6 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_ARGS__)))
#define XXX_CM_UP_7(P,a,b,c,d,e,f,...) XXX_CM_PASS_UP_8 (XXX_CM_PASS_DN_7 (XXX__##f(,P##a,P##b,P##c,P##d,P##e,P##__VA_AR