ROCKpro64
by PINE64- Rockchip
RK3399
- Rockchip
big.LITTLE
architecture:- Dual Cortex-A72
0xd08
- Quad Cortex-A53
0xd03
- Dual Cortex-A72
NOTE: "CPU part" identifies the A53 and A72 CPUs respectvely.
mbohun@rockpro64a:~$ cat /proc/cpuinfo | grep -E "processor|model name|CPU part"
processor : 0
CPU part : 0xd03
processor : 1
CPU part : 0xd03
processor : 2
CPU part : 0xd03
processor : 3
CPU part : 0xd03
processor : 4
CPU part : 0xd08
processor : 5
CPU part : 0xd08
mbohun@rockpro64a:~$
mbohun@rockpro64a:~$ cpupower frequency-info
analyzing CPU 1:
driver: cpufreq-dt
CPUs which run at the same hardware frequency: 0 1 2 3
CPUs which need to have their frequency coordinated by software: 0 1 2 3
maximum transition latency: 40.0 us
hardware limits: 408 MHz - 1.42 GHz
available frequency steps: 408 MHz, 600 MHz, 816 MHz, 1.01 GHz, 1.20 GHz, 1.42 GHz
available cpufreq governors: performance schedutil
current policy: frequency should be within 408 MHz and 1.42 GHz.
The governor "schedutil" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 1.01 GHz (asserted by call to kernel)
mbohun@rockpro64a:~$
mbohun@rockpro64a:~$ cpupower --cpu 4 frequency-info
analyzing CPU 4:
driver: cpufreq-dt
CPUs which run at the same hardware frequency: 4 5
CPUs which need to have their frequency coordinated by software: 4 5
maximum transition latency: 465 us
hardware limits: 408 MHz - 1.80 GHz
available frequency steps: 408 MHz, 600 MHz, 816 MHz, 1.01 GHz, 1.20 GHz, 1.42 GHz, 1.61 GHz, 1.80 GHz
available cpufreq governors: performance schedutil
current policy: frequency should be within 408 MHz and 1.80 GHz.
The governor "schedutil" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 816 MHz (asserted by call to kernel)
mbohun@rockpro64a:~$
- Orange Pi 5 Pro by Orange Pi
- Rockchip RK3588S
- big.LITTLE architecture:
- quad-core A55 TODO: hex code
- quad-core A76 TODO: hex code
2. Use SIMD (NEON) Intrinsics:
The A72 has a very powerful NEON SIMD unit. If your code involves heavy number crunching (image processing, linear algebra, audio/video encoding), you can get a massive speedup by using NEON intrinsics to process multiple data points with a single instruction.
Example of a simple NEON intrinsic adding four floats at once:
The compiler's auto-vectorization with
-O3 -mcpu=cortex-a72
is very good, but for maximum control and performance, hand-tuning with intrinsics is the way to go.3. Cache Awareness:
The A72 has a larger and more sophisticated cache system (L1, L2) compared to the A53. Write cache-friendly code:
Summary and Recommended Workflow
-O3 -mcpu=cortex-a72
). This often gives the biggest gain for the least effort.taskset
orsched_setaffinity
to force your optimized application to run only on the A72 cores. This prevents the OS scheduler from migrating it to a slower A53 core.