This is an idea for adding vector processing support to tiny RISC style microcontroller CPU:s (e.g. similar to FEMTORV32).
It could be especially beneficial for CPU:s without a pipeline, where each instruction normally takes 2+ clock cycles to complete.
The purpose of vector processing is to reduce the number of clock cycles required for each operation:
- Loop logic overhead is reduced (increment, compare, branch) as several data elements are processed in each loop iteration.