Skip to content

Instantly share code, notes, and snippets.

#pragma once
#pragma warning(push)
#pragma warning(disable:4996)
#include <cassert>
#include <memory>
#include <vector>
I want to show how you can optimize the rendering functions without going into changes on the upper level (like removing round
stuff and so on), but I want to teach also the ideas behind the optimizations I'm proposing. I have no esp32 board to test my
changes, to I will try to teach the ideas and let others test the optimizations. And I don't want to write an fast software
texture mapper (again) without explaining why it's much faster than the current source code.
1.) optimizing color conversion
this should be done automatically by your compiler (hopefully, has to be checked at assembly level!), but some background:
multiplications and divides can be optimized if they're a power of 2 by bit-shifts, means divison by 255 can also be done with
bitshit of 8 bits to the right. so looking at the first color conversion routine:
@questor
questor / opengl.cpp
Created December 14, 2017 10:34
opengl-1 emulation layer for direct3d9
#include <xtl.h>
#include <xgraphics.h>
#include <stdio.h>
#include "openGL.h"
extern LPDIRECT3DDEVICE8 D3D_Device;
#define MAX_MAT_STACK_MODV 40
#define MAX_MAT_STACK_PROJ 40
@questor
questor / gist:5141764
Created March 12, 2013 10:14
bgfx error
~/projects/bgfx/bgfx/examples/runtime $ ../../.build/linux32_gcc/bin/example-09-hdrDebug
../../../src/bgfx.cpp(685): BGFX init
../../../src/bgfx_p.h(781): BGFX ConstantBuffer 524288, 524280
../../../src/bgfx_p.h(781): BGFX ConstantBuffer 524288, 524280
../../../src/glcontext_glx.cpp(62): BGFX glX num configs 20
../../../src/glcontext_glx.cpp(70): BGFX ---
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 3: 8, 8 ( 8)
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 4: a, 8 ( 8)
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 5: 9, 8 ( 8)
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 6: c, 18 ( 18)
@questor
questor / simd_test1
Created November 22, 2012 14:05
more sse tests with expression templates and latency hide stragegy
I came across an article to optimise simd further and to test this I've modified your testbed and wanted to share the results.
I've tested your template benchmark code when using two sse registers parallel to break this pattern up:
load v0
process v0
store v0
load v1
process v1
store v1