This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I came across an article to optimise simd further and to test this I've modified your testbed and wanted to share the results. | |
I've tested your template benchmark code when using two sse registers parallel to break this pattern up: | |
load v0 | |
process v0 | |
store v0 | |
load v1 | |
process v1 | |
store v1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
~/projects/bgfx/bgfx/examples/runtime $ ../../.build/linux32_gcc/bin/example-09-hdrDebug | |
../../../src/bgfx.cpp(685): BGFX init | |
../../../src/bgfx_p.h(781): BGFX ConstantBuffer 524288, 524280 | |
../../../src/bgfx_p.h(781): BGFX ConstantBuffer 524288, 524280 | |
../../../src/glcontext_glx.cpp(62): BGFX glX num configs 20 | |
../../../src/glcontext_glx.cpp(70): BGFX --- | |
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 3: 8, 8 ( 8) | |
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 4: a, 8 ( 8) | |
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 5: 9, 8 ( 8) | |
../../../src/glcontext_glx.cpp(84): BGFX glX 0/20 6: c, 18 ( 18) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <xtl.h> | |
#include <xgraphics.h> | |
#include <stdio.h> | |
#include "openGL.h" | |
extern LPDIRECT3DDEVICE8 D3D_Device; | |
#define MAX_MAT_STACK_MODV 40 | |
#define MAX_MAT_STACK_PROJ 40 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I want to show how you can optimize the rendering functions without going into changes on the upper level (like removing round | |
stuff and so on), but I want to teach also the ideas behind the optimizations I'm proposing. I have no esp32 board to test my | |
changes, to I will try to teach the ideas and let others test the optimizations. And I don't want to write an fast software | |
texture mapper (again) without explaining why it's much faster than the current source code. | |
1.) optimizing color conversion | |
this should be done automatically by your compiler (hopefully, has to be checked at assembly level!), but some background: | |
multiplications and divides can be optimized if they're a power of 2 by bit-shifts, means divison by 255 can also be done with | |
bitshit of 8 bits to the right. so looking at the first color conversion routine: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#pragma once | |
#pragma warning(push) | |
#pragma warning(disable:4996) | |
#include <cassert> | |
#include <memory> | |
#include <vector> | |