linux build
Linux-5.14.15 の make defconfig したものから make を二回やって二回目
[J] は、/sys/class/powercap/intel-rapl:0/energy_uj を読んで出たJoule値 (CPU内蔵センサー値なので、AMDとIntelで基準が違う可能性あり)
以下のようなのを rapl-run.py として、
#!/usr/bin/env python3
import sys
import subprocess
def main():
    args = sys.argv[1:]
    if len(args) == 0:
        raise Exception("no arg")
    f = open("/sys/class/powercap/intel-rapl:0/energy_uj", "r")
    val = int(f.readline())
    subprocess.call(args)
    f.seek(0)
    val2 = int(f.readline())
    delta = val2 - val
    print("%f [J]"%(delta/1e6))
if __name__ == '__main__':
    main() $ rapl-run.py perf stat make -j $(expr $(nproc) '*' 2)
として実行
Ryzen
- 111.841197130 seconds time elapsed
 - 9396.697417 [J]
 - 0.92 insn per cycle
 
i7
- 68.777557306 seconds time elapsed
 - 9956.462915 [J]
 - 1.09 insn per cycle
 
Ryzen 3700X が 111.8秒で終わってるのに対して、i7 12600K は 68.8秒で終わる。
この方法で見る場合だと、消費電力は i7 のほうが少し多い。
IPC は i7 のほうがいい。
P-core 8コアのみで実行すると、 74.5秒、11341 [J]
- 74.519648981 seconds time elapsed
 - 11340.825215 [J]
 - 1.11 insn per cycle
 
E-core 4コアのみで実行すると、
- 347.804299616 seconds time elapsed
 - 7742.243791 [J]
 - 1.15 insn per cycle
 
E-core のIPCが一番高い
# Ryzen 7 3700X
 Performance counter stats for 'make -j32':
      1,645,905.15 msec task-clock                #   14.716 CPUs utilized          
           374,169      context-switches          #  227.333 /sec                   
            38,120      cpu-migrations            #   23.161 /sec                   
        53,201,666      page-faults               #   32.324 K/sec                  
 6,186,487,258,458      cycles                    #    3.759 GHz                      (83.79%)
   584,145,164,956      stalled-cycles-frontend   #    9.44% frontend cycles idle     (83.77%)
   387,679,849,583      stalled-cycles-backend    #    6.27% backend cycles idle      (83.74%)
 5,664,603,284,033      instructions              #    0.92  insn per cycle         
                                                  #    0.10  stalled cycles per insn  (83.77%)
 1,197,485,237,223      branches                  #  727.554 M/sec                    (83.78%)
    35,326,280,086      branch-misses             #    2.95% of all branches          (83.78%)
     111.841197130 seconds time elapsed
    1492.347945000 seconds user
     142.398664000 seconds sys
9396.697417 [J]
# i7-12700K
 Performance counter stats for 'make -j 40':
      1,165,372.81 msec task-clock                #   16.944 CPUs utilized
           290,107      context-switches          #  248.939 /sec
            39,203      cpu-migrations            #   33.640 /sec
        53,182,055      page-faults               #   45.635 K/sec
 5,203,504,969,685      cycles                    #    4.465 GHz
 5,650,225,661,659      instructions              #    1.09  insn per cycle
 1,192,600,178,609      branches                  #    1.023 G/sec
    30,412,406,301      branch-misses             #    2.55% of all branches
      68.777557306 seconds time elapsed
    1074.093308000 seconds user
      90.981504000 seconds sys
p
9956.462915 [J]
# P core のみ
# $ numactl -C 0-15 rapl-run.py perf stat make -j 32
 Performance counter stats for 'make -j 32':
      1,081,639.26 msec task-clock                #   14.515 CPUs utilized
           247,198      context-switches          #  228.540 /sec
            39,099      cpu-migrations            #   36.148 /sec
        53,195,128      page-faults               #   49.180 K/sec
 5,069,304,612,802      cycles                    #    4.687 GHz
 5,650,121,874,431      instructions              #    1.11  insn per cycle
 1,192,553,666,047      branches                  #    1.103 G/sec
    29,885,232,531      branch-misses             #    2.51% of all branches
      74.519648981 seconds time elapsed
    1002.042991000 seconds user
      79.129940000 seconds sys
11340.825215 [J]
# E core のみ
# $ numactl -C 16-19 rapl-run.py perf stat make -j 8
 Performance counter stats for 'make -j 8':
      1,360,440.69 msec task-clock                #    3.912 CPUs utilized
           224,271      context-switches          #  164.852 /sec
            19,907      cpu-migrations            #   14.633 /sec
        53,176,928      page-faults               #   39.088 K/sec
 4,895,967,057,617      cycles                    #    3.599 GHz
 5,649,308,720,014      instructions              #    1.15  insn per cycle
 1,192,383,821,310      branches                  #  876.469 M/sec
    33,308,624,261      branch-misses             #    2.79% of all branches
     347.804299616 seconds time elapsed
    1264.263234000 seconds user
      96.071435000 seconds sys
7742.243791 [J]