Skip to content

Instantly share code, notes, and snippets.

@foxtran
foxtran / transpose.f
Created October 9, 2024 16:40
Perform B <- alpha * A^T + B and B <- A^T in Fortran in sub-optimal way with OpenMP parallelization. Given implementation is faster 3-10 times in comparison with naive implementation with BLAS-1 routines.
!>
!> @brief Perform B <- alpha * A^T + B
!>
!> @note OpenMP parallelization is used for faster matrix transposing.
!> The speed-up is about 3-10 times in comparison
!> with original implementation via BLAS-1 routines.
!>
!> @param[in] n size of square matrix
!> @param[in] alpha scaling factor
!> @param[in] mat n-by-n matrix A