Skip to content

Instantly share code, notes, and snippets.

View foxtran's full-sized avatar

foxtran

  • Budapest, Hungary
View GitHub Profile
@foxtran
foxtran / transpose.f
Created October 9, 2024 16:40
Perform B <- alpha * A^T + B and B <- A^T in Fortran in sub-optimal way with OpenMP parallelization. Given implementation is faster 3-10 times in comparison with naive implementation with BLAS-1 routines.
!>
!> @brief Perform B <- alpha * A^T + B
!>
!> @note OpenMP parallelization is used for faster matrix transposing.
!> The speed-up is about 3-10 times in comparison
!> with original implementation via BLAS-1 routines.
!>
!> @param[in] n size of square matrix
!> @param[in] alpha scaling factor
!> @param[in] mat n-by-n matrix A
module helper_mod {
public i32 sudctoi(c8 c) {
consteval i32 C0 = i32('0'), C9 = i32('9'), CA = i32('A'), CZ = i32('Z')
if (irepr >= C0 .and. irepr <= C9) {
return i32(c) - C0
} else if (irepr >= CA .and. irepr <= CZ) {
return i32(c) - CA + 10
} else {
return 0
}
#!/bin/bash
## This script does the following:
## 1. It checks out and builds trunk LLVM.
## 2. It checks out and builds the create_llvm_prof tool.
## 3. It builds multiple clang binaries towards building a
## propeller optimized clang binary.
## 4. It runs performance comparisons of a baseline clang
## binary and the Propeller optimized clang binary.
#!/bin/bash
## This script does the following:
## 1. It checks out and builds trunk LLVM.
## 2. It checks out and builds the create_llvm_prof tool.
## 3. It builds multiple clang binaries towards building a
## propeller optimized clang binary.
## 4. It runs performance comparisons of a baseline clang
## binary and the Propeller optimized clang binary.

Benchmark was compiled using the following compiler:

GCC version 12.2.0

Benchmark was compiled with the following options:

-mabi=lp64d -mcpu=sifive-u74 -misa-spec=20191213 -march=rv64imafdc_zicsr -O3 -ffree-line-length-none -fpre-include=/usr/include/finclude/riscv64-linux-gnu/math-vector-fortran.h

Benchmark was compiled using the following compiler:

GCC version 12.2.0

Benchmark was compiled with the following options:

-mabi=lp64d -mcpu=sifive-u74 -misa-spec=20191213 -march=rv64imafdc_zicsr -O3 -ffree-line-length-none -fpre-include=/usr/include/finclude/riscv64-linux-gnu/math-vector-fortran.h

Number of repeats is: 100000

Benchmark was compiled using the following compiler:

GCC version 12.2.0

Benchmark was compiled with the following options:

-mabi=lp64d -mcpu=sifive-s76 -misa-spec=20191213 -march=rv64ifd_zicsr -O3 -ffree-line-length-none -fpre-include=/usr/include/finclude/riscv64-linux-gnu/math-vector-fortran.h

Benchmark was compiled using the following compiler:

GCC version 11.3.0

Benchmark was compiled with the following options:

-mabi=lp64d -mcpu=sifive-u74 -misa-spec=2.2 -march=rv64imafdc -O3 -Wall -Wextra -ffree-line-length-none -fpre-include=/usr/include/finclude/riscv64-linux-gnu/math-vector-fortran.h
24
symmetry c1
C -0.162106268 0.986301416 2.475612750
C -0.162101268 -0.406019584 2.475612750
C 1.043582732 -1.102098584 2.475260750
C 2.249277732 -0.406018584 2.475612750
C 2.249273732 0.986302416 2.475612750
C 1.043588732 1.682382416 2.475260750
H -1.092522268 1.524132416 2.470409750
H -1.092523268 -0.943850584 2.470409750
PROGRAM test_div
IMPLICIT NONE
INTEGER :: count1, count_rate1, count_max1
INTEGER :: count2, count_rate2, count_max2
INTEGER :: N
REAL(8), allocatable :: a(:), b(:), c(:)
read(*,*) N
allocate(a(N), b(N), c(N))
CALL RANDOM_NUMBER(a)
CALL RANDOM_NUMBER(b)