Skip to content

Instantly share code, notes, and snippets.

@SteveBronder
Last active March 19, 2025 20:47
Show Gist options
  • Save SteveBronder/aa19ff60c02ac02de9399d056f730da6 to your computer and use it in GitHub Desktop.
Save SteveBronder/aa19ff60c02ac02de9399d056f730da6 to your computer and use it in GitHub Desktop.

Eigen 3.5.0 Release Notes

Breaking Changes

-#606: Removed Sparse Dynamic Matrix, representing an API break.

-#649: Move Eigen::all, last, lastp1 back to Eigen::placeholders:: to reduce name collisions.

-#740: Remove DenseBase::nonZeros() as it duplicates DenseBase::size() functionality.

-#658: Update SVD Module to allow specifying computation options with a template parameter, replacing the previous QRPreconditioner parameter.

-#749: Reverted SVD module update to restore compatibility with third-party libraries.

-#744: Require recent GCC and MSCV, removed EIGEN_HAS_CXX14 and other feature test macros.

-#771: Renamed Eigen::internal::size to ssize to prevent ADL conflicts, aligning with C++ standards.

-#826: Introduced Options template parameter in SVD module with API breaking changes for improved flexibility.

-#857: Re-add svd::compute(Matrix, options) method to avoid breaking external projects.

-#1240: Revert comparison overloads to return bool array and add cwiseTypedLesser for typed comparisons.

-#1260: Require C++14 standard for detecting Inf and NaN.

-#1280: Disable raw array indexed view access for 1D arrays to improve stability.

-#1301: Ensure Euler angles are returned with canonical ranges, impacting existing computations.

-#1475: Remove MoreVectorization feature due to redundancy and potential ODR violations.

-#1474: Remove Skyline module due to non-functionality and lack of tests.

-#1498: Removed r_cnjg due to conflicts with f2c, inlining functions to resolve duplicate symbol errors.

-#1497: Removed non-standard int return types from BLAS/LAPACK functions to improve interoperability and reduce symbol conflicts.

-#1730: Revert change to make fixed-size objects trivially move assignable due to issues with setZero().

Major Features

-#515: Add random matrix generation via SVD for enhanced testing capabilities.

-#612: Add support for EIGEN_TENSOR_PLUGIN and related functionalities for enhanced tensor manipulation.

-#447: Introduces BiCGSTAB(L) algorithm for solving linear systems, enhancing capabilities for non-symmetric systems.

-#577: Introduces the IDR(s)STAB(l) method for solving sparse square problems, enhancing convergence and computational efficiency.

-#791: Add support for Cray, Fujitsu, and Intel ICX compilers.

-#856: Add support for Apple's Accelerate sparse matrix solvers to enhance performance of sparse matrix computations.

-#798: Introduced a NNLS solver using the active-set algorithm, enhancing the library's capabilities for non-negative least squares problems.

-#965: Add fused multiply functions for PowerPC - pmsub, pnmadd and pnmsub.

-#356: Added PocketFFT support in FFT module to improve accuracy and performance over KissFFT.

-#973: Introduces the .arg() method to enhance Tensor functionality.

-#981: Introduced an MKL adapter in FFT module with fixes and new library implementations.

-#978: Add Sparse Subset of Matrix Inverse using the Takahashi algorithm and Kahan Summation for efficiency and stability.

-#1004: Implemented true determinant calculation for QR decomposition classes.

-#1029: Add fixed power unary operation for coefficientwise real-valued power operations on arrays.

-#1017: Add support for AVX512-FP16 for vectorizing half precision math.

-#1047: Introduces a skew symmetric matrix class for 3D vectors using Rodrigues' rotation formula.

-#1082: Add a vectorized implementation of atan2 for enhanced performance in Eigen.

-#1097: Introduced a signbit function to efficiently determine the sign of floating point values.

-#1098: Implemented cross product for 2D vectors, enriching calculations.

-#1103: Introduced a utility to sort inner vectors of sparse matrices using a customizable comparison function, enhancing sparse algorithm performance.

-#1133: Introduced setEqualSpaced for vectorized creation of equally spaced vectors.

-#1152: Add template for QR permutation index type and fix ColPivHouseholderQR Lapacke bindings.

-#1211: Introduced CArg for vectorized complex argument calculations in Eigen.

-#1203: Introduces typed logical operators, enabling full vectorization and improved handling of logical operations across scalar types.

-#1244: Introduce permutation index specification for PartialPivLU and FullPivLU, enhancing compatibility with Lapacke ILP64.

-#1281: Introduced insertFromTriplets and insertFromSortedTriplets for efficient batch insertions in sparse matrices.

-#1285: Introduces USM support for SYCL, simplifying usage and improving performance.

-#1314: Introduce canonicalEulerAngles method to provide standardized angle ranges.

-#1330: Enabling half precision support for SYCL using Eigen::half abstraction.

-#1403: Implemented component-wise cubic root (cbrt) calculations for arrays and matrices.

-#1462: Introduces feature to specify a temporary directory for file I/O outputs, enhancing compatibility with systems that restrict writing to the current directory.

-#1414: Implemented plog_complex to handle vectorized complex functions.

-#1512: Add method signDeterminant() to QR and related decompositions.

-#1554: Introduced SimplicialNonHermitianLLT and SimplicialNonHermitianLDLT solvers for complex symmetric matrix handling.

-#1395: Incorporate Threadpool in Eigen Core for enhanced computation performance.

-#1696: Make fixed size matrices and arrays trivially_default_constructible.

-#1627: Implement Tensor roll function for circular shifts.

-#1777: Add support for LoongArch64 LSX architecture to enhance Eigen's capabilities.

Improvements

-#544: Added support for Eigen::Block types to GDB pretty printer.

-#607: Added a flowchart to aid in selecting unsupported sparse iterative solvers.

-#610: Updated CMake configuration to require at least C++11, centralizing standard settings for clarity and maintainability.

-#605: Updated RandomSetter in SparseExtra to use unordered_map for improved performance.

-#543: Fix PEP8 and formatting issues in GDB pretty printer.

-#611: Included unordered_map header to enhance functionality related to unordered maps.

-#613: Fixes Eigen::fix<N> and symbolic_index test for environments without variable template support.

-#614: Enhances LAPACK test compatibility with newer Fortran compilers, addressing argument mismatches.

-#615: Include intrin header for better compatibility on Windows ARM.

-#616: Disable CUDA Eigen::half vectorization on host for versions before 10.0.

-#621: Improved compatibility and reduced warnings for GCC 4.8 on ARM.

-#619: Fixed documentation for unsupported linear solvers.

-#629: Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang to enhance compatibility.

-#628: Renamed 'vec_all_nan' to resolve ppc64le build failures.

-#485: Remove deprecated package config variables from CMake configuration.

-#622: Rename Tuple to Pair and introduce a Tuple class enhancing GPU compatibility.

-#632: Simplified CMake configuration by removing unused EIGEN_DEFINITIONS.

-#633: Use ARCH_INDEPENDENT versioning in CMake for improved package configuration management.

-#617: Enhanced matrixmarket reader/writer to support various dense matrices.

-#634: CMake now populates package registry by default for improved package management.

-#635: Fixed tridiagonalization_inplace_selector to align hCoeffs vector with MatrixType, enhancing stability.

-#636: Remove stray DynamicSparseMatrix references to clean and maintain codebase.

-#637: Removed extra DynamicSparseMatrix references and fixed typos for improved code clarity.

-#638: Added missing packet types in the pset1 call to improve robustness.

-#482: Add LLDB Pretty Printer for enhanced debugging of Eigen matrices and vectors.

-#624: Add Serializer<T> for binary serialization to enhance GPU testing.

-#641: Removed unnecessary std::tuple reference to simplify the code.

-#631: Issue an error when internal headers are included directly to enhance code safety.

-#625: Introduce new GPU test utilities enhancing execution flexibility across CPU and GPU.

-#643: Minor fix for compilation error on HIP to enhance compatibility.

-#645: Introduced a default constructor for eigen_packet_wrapper, simplifying memory operations with memcpy.

-#647: Clean up static assertions to use standard C++11 static_assert for better error messages and performance.

-#651: Removed -fabi-version=6 flag from AVX512 builds for improved compatibility and performance.

-#646: Add buildtests_gpu and check_gpu to simplify GPU testing.

-#653: Disabled subtests on HIP to maintain test stability due to device side malloc/free limitations.

-#652: Added a macro to pass arguments to ctest for running tests in parallel.

-#654: Silence string overflow warning for GCC in initializer_list_construction test.

-#656: Fix strict aliasing bug causing product_small failure, enhancing reliability of small matrix operations.

-#655: Implemented parallel execution of CI tests on all CPU cores, enhancing test speed and efficiency.

-#657: Fix implicit conversion warnings in tuple_test to enhance type safety and code clarity.

-#572: Removed unnecessary const when returning by value to enhance code readability.

-#660: Fix various typos to enhance clarity and professionalism in the documentation.

-#661: Fixes typographical errors to improve readability and professionalism.

-#664: Disable testing of complex compound assignment operators for MSVC to prevent compilation issues.

-#671: Improved GPU special function test accuracy by aligning with scipy.

-#669: Optimized tensor_contract_gpu test by reducing contraction count to prevent timeouts.

-#667: Speed up tensor reduction using loop strip mining and unrolling techniques for enhanced performance.

-#678: Moved CUDA/Complex.h to GPU/Complex.h and removed deprecated TensorReductionCuda.h.

-#665: Fix tuple compilation issues for Visual Studio 2017 by replacing tuple alias with TupleImpl.

-#666: Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation issues.

-#676: Improve accuracy of full tensor reduction for half and bfloat16.

-#686: Revert bit_cast to use memcpy for CUDA to avoid undefined behavior.

-#687: Add nan-propagation options to matrix and array plugins for improved handling of NaN in min/max operations.

-#691: Fix -Wbitwise-instead-of-logical clang warning to enhance code clarity and correctness.

-#693: Clarified documentation on inner stride for compile-time vectors in Stride class.

-#692: Extend EIGEN_QT_SUPPORT to Qt6 for better compatibility.

-#688: Introduced nan-propagation options to enhance NaN value handling in matrix and array plugins.

-#696: Removed const from visitor return type to fix compatibility issues on ARM and PPC.

-#689: Fixed index-out-of-bounds error in broadcasting for vectorized 1-dimensional inputs.

-#698: Enhance CommaInitializer to reuse fixed dimensions, improving performance and consistency.

-#695: Fix boostmultiprec test to compile with older Boost versions, enhancing compatibility.

-#681: Addressed integer overflows in EigenMetaKernel indexing to enhance robustness.

-#701: Updated alignment qualifier placement for consistency and addressed compiler warnings.

-#700: Vectorize fp16 tanh and logistic functions on Neon for enhanced performance.

-#694: Fix ZVector build issues on s390x and enhance test reliability.

-#697: Streamline CMake scripts to enhance subproject integration, reducing unnecessary test builds.

-#703: Fix NaN propagation in min/max functions for scalar inputs.

-#702: Introduces AVX vectorized implementation for float2half and half2float, boosting performance 3x.

-#705: Fix TensorReduction warnings and error bound for sum accuracy test.

-#707: Fix total deflation issue in BDCSVD when M is diagonal and add unit tests.

-#709: Fixed BDCSVD's total deflation logic to enhance performance with diagonal matrices.

-#714: Fix uninitialized matrix in nestbyvalue test to enhance reliability.

-#712: Enhance documentation for Quaternion constructor from MatrixBase, clarifying element order.

-#711: Fixes bug in macro definition EIGEN_HAS_FP16_C for non-Clang compilers.

-#715: Fix failing test for tensor reduction by comparing results against forward error bounds.

-#713: Avoid integer overflow in EigenMetaKernel indexing to enhance reliability and prevent CUDA errors.

-#121: Introduced a make format command for automatic code formatting.

-#720: Corrected a typo in the documentation to enhance clarity.

-#717: Moved prune function to SparseVector.h to improve sparse matrix storage organization.

-#718: Use consistent StorageIndex across SparseMatrix implementations.

-#327: Reimplemented Tensor stream output for enhanced flexibility and consistency.

-#723: Fix tensor broadcast off-by-one error to enhance robustness.

-#722: Enhanced efficiency in Umeyama algorithm by optimizing computation when scaling is not required.

-#724: Enhanced TensorIO to support TensorMap with const elements.

-#719: Fixed Sparse-Sparse Product in case of mixed StorageIndex types enhancing robustness and reliability.

-#728: Fix errors for Windows build and enhance compatibility.

-#726: Added basic iterator support for Eigen::array to ease transition to std::array.

-#725: Removed deprecated MappedSparseMatrix to enhance maintainability.

-#727: Make numeric_limits members constexpr as per the newer C++ standards.

-#729: Implemented Eigen::array<...>::reverse_iterator for enhanced iteration capabilities.

-#733: Fix warnings about shadowing definitions to improve code clarity and maintainability.

-#732: Remove EIGEN_HAS_CXX11 to streamline codebase and enhance maintainability.

-#737: Refactored LLT macro binding to Lapacke into smaller parts for better clarity and maintainability.

-#741: Fix for HIP compilation failure in DenseBase by adding EIGEN_DEVICE_FUNC modifiers.

-#735: Removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks to streamline code and enhance maintainability.

-#742: Updated minimum CMake version to 3.10, set GCC version to 5, and removed disabling of C++11 tests.

-#730: Fixes issue #2375 related to indexed views for non-Eigen types, improving stability.

-#736: Improved handling of non-const overloads in self-adjoint and triangular views when not referring to an lvalue.

-#746: Fixed Cholesky to handle 0-sized matrices, ensuring LAPACKE-based LLT aligns with Eigen's expectations.

-#739: Disabled GCC-4.8 tests to streamline transition to C++14.

-#752: Deprecate macro EIGEN_GPU_TEST_C99_MATH to reduce code clutter and simplify maintenance.

-#748: Improved lapacke binding code for HouseholderQR and PartialPivLU.

-#755: Removed unused else branch after #ifdef removal, enhancing code clarity.

-#757: Refactored IDRS code for stability and performance enhancements using StableNorm().

-#756: Improve compatibility with toolchains lacking atomic support by conditional inclusion of <atomic>.

-#759: Fix typo StableNorm to stableNorm to ensure naming consistency.

-#762: Improved readability and reliability of documentation snippets in the Eigen library.

-#761: Removed outdated compiler checks and flags to streamline the codebase.

-#765: Disambiguate overloads for empty index list to reduce compiler warnings and improve code clarity.

-#760: Removed using namespace Eigen in sample code to promote better coding practices.

-#767: Ensure exp(-Inf) returns zero for vectorized expressions and improve AVX2 and SSE performance.

-#758: Introduced GPU unit tests for HIP using C++14.

-#763: Removed use of deprecated CMake COMPILE_FLAGS in favor of modern options.

-#768: Removed custom Find*.cmake scripts in favor of CMake's built-in support for better compatibility.

-#770: Fixed customIndices2Array to include the first index, enhancing tensor module functionality.

-#769: Added guards to enforce proper header inclusion practices for CholmodSupport.

-#753: Converted computational macros to constexpr functions for enhanced type safety and code maintainability.

-#776: Enables EIGEN_TEST_CUSTOM_CXX_FLAGS to be used as a CMake list by converting spaces to semicolons.

-#782: Fix bug introduced in !751 affecting EIGEN_IMPLIES macro handling side-effects.

-#783: Simplify logical_xor() for bool types by using a != b.

-#785: Fixed Clang warnings about alignment change and floating point precision.

-#786: Small cleanup of GDB pretty printer code to improve readability and maintainability.

-#788: Minor fixes to documentation and code warnings for improved clarity and quality.

-#790: Added missing internal namespace qualifiers to improve clarity in vectorization logic tests.

-#779: Optimize exp() for denormal results and 4% speedup.

-#793: Removed unused macro EIGEN_HAS_STATIC_ARRAY_TEMPLATE to enhance maintainability.

-#797: Add bounds checking to the Eigen serializer to enhance data integrity and reliability.

-#800: Corrected functionality of GPU unit tests for HIP after serialization API changes.

-#792: Allow specifying inner & outer stride for CWiseUnaryView, enhancing functionality and control.

-#799: Improve plog with 20% speedup for float and handle denormals.

-#802: Fixes truncation from unsigned int to bool, improving reliability of type conversions.

-#801: Fixes and cleanups for numeric_limits and psqrt bug.

-#796: Make fixed-size Matrix and Array trivially copyable using C++20 features.

-#803: Fix Gcc8.5 warning about missing base class initialization.

-#805: Ensure consistent values from scalar and vectorized paths in array.exp().

-#780: Improved accuracy and performance of logistic sigmoid function.

-#806: Fix IterativeSolverBase assertion messages to reference the correct class name.

-#795: Refactor to reduce usage of reserved names, enhancing compliance with C++ standards.

-#808: Addressed type compatibility in pmadd with explicit casting for enhanced type safety.

-#810: Fix two corner cases in logistic sigmoid to enhance accuracy and robustness.

-#809: Corrects broken assertions to enhance reliability and robustness.

-#811: Fix compilation issue with GCC < 10 and C++2a standard.

-#812: Fix implicit conversion warning in vectorwise_reverse_inplace.

-#814: Updated comment to replace reference to removed macro EIGEN_SIZE_MIN_PREFER_DYNAMIC with constexpr function.

-#813: Minor correction/clarification to LSCG solver documentation.

-#815: Fix implicit conversion warning in GEBP kernel's packing by changing data types to improve type safety.

-#819: Enhance clang warning suppressions by verifying supported warnings.

-#818: Silence MSVC warnings for cleaner builds and easier debugging.

-#772: Removed obsolete macros and implementation for better code maintainability.

-#817: Add support for packets of int64 on x86 to enhance processing efficiency.

-#821: Prevented heap allocation in diagonal matrix product by using reference types, optimizing performance.

-#822: Make casts explicit and fix type to prevent overflow issues in random test implementation.

-#825: Enhance handling of float warnings by refining comparisons and conversions.

-#827: Optimized preciprocal function for IEEE compliance, improving performance with division by zero and infinity.

-#830: Removed outdated documentation referencing C++98/03 standards.

-#835: Fixed ODR violations by removing unnamed namespaces from headers.

-#833: Fixes type discrepancy issues on 32-bit ARM by using int32_t consistently.

-#840: Correct use of EIGEN_CUDACC to respect EIGEN_NO_CUDA, preventing unwanted compilation of CUDA code when disabled.

-#838: Defined EIGEN_HAS_AVX512_MATH correctly in PacketMath to enhance AVX512 support.

-#836: Limit GCC<6.3 maxpd workaround to GCC, improving compatibility with Clang.

-#841: Consolidated and enhanced implementations of fast psqrt and prsqrt for correct handling of edge cases.

-#844: Updated MPL2 license link to use HTTPS for enhanced security.

-#843: Fix naming collisions with resolve.h to enhance code clarity and stability.

-#845: Provide a definition for numeric_limits static data members to enhance compliance with C++ standards.

-#846: Return alphas() and betas() by const reference to enhance performance and memory efficiency.

-#849: Enhance documentation with MatrixXNt and MatrixNXt details and fix namespace issues.

-#852: Add constexpr size() method to Eigen::IndexList for compile-time size evaluation.

-#853: Fix ODR failures in TensorRandom to enhance code stability and reliability.

-#855: Remove unused macros related to obsolete prsqrt implementation.

-#859: Fix MSVC+NVCC 9.2 pragma error by replacing _Pragma with __pragma.

-#850: Enhance Doxygen documentation by adding descriptions to Matrix typedefs.

-#861: Made FixedInt constexpr and fixed potential ODR issues by removing static from the variable template.

-#862: Restores fixed sizes for U/V matrices for fixed-sized inputs, reverting unnecessary dynamic sizing.

-#866: Initialize pointers to nullptr in SPQRSupport to prevent crash from invalid free() calls.

-#870: Fix test macro conflicts with STL headers in C++20.

-#865: Add assert for edge case if Thin U Requested at runtime.

-#863: Modified test expressions to ensure numerical consistency across optimization levels.

-#868: Optimizations to fast SQRT/RSQRT for enhanced performance on modern x86 processors.

-#873: Disabled deprecated warnings in SVD tests to enhance log readability.

-#874: Fix gcc-5 packetmath_12 bug by initializing memory to zeroes.

-#875: Fix compilation error in packetmath by introducing a wrapper around psqrt.

-#877: Disable deprecated warnings for SVD tests on MSVC for cleaner build logs.

-#878: Fix frexp packetmath tests for MSVC to handle non-finite inputs correctly.

-#876: Fix mixingtypes for g++-11, improving stability and performance with AVX512 operations.

-#880: Fix SVD for MSVC by addressing a critical bug with Options template parameter handling.

-#879: Improved any/all reduction operations for row-major layout.

-#882: Fixes compatibility issues with SVD implementation for MSVC+CUDA addressing Index type discrepancies and function return warnings.

-#883: Adjust tolerance of matrix_power test for MSVC to reduce test failures.

-#884: Removed overly strict non-convergence checks in NonLinearOptimization tests to improve flexibility and reliability.

-#886: Skip denormal test if Cond is false, enhancing test suite efficiency.

-#885: Fix enum conversion warnings in BooleanRedux.

-#888: Speed lscg by using .noalias to enhance computation speed in least squares conjugate gradient function.

-#851: Fixes inconsistency in JacobiSVD_LAPACKE bindings to enhance SVD module.

-#887: Enhanced vectorization_logic tests for platform compatibility and reliability.

-#864: Removed unnecessary EIGEN_UNUSED decorations to improve code clarity.

-#890: Removed duplicate IsRowMajor declaration to reduce compilation warnings.

-#891: Split and reduce SVD test sizes to optimize memory usage and improve compile times.

-#893: Introduces new CMake options for controlling build components.

-#894: Fixed tensor executor test and supported tensor packets of size 1 for better platform compatibility.

-#897: Removed obsolete copy_bool workaround for gcc 4.3, enhancing code maintainability.

-#895: Introduce move constructors for SparseSolverBase and IterativeSolverBase for enhanced flexibility.

-#889: Introduce construct_at, destroy_at wrappers to replace placement new and explicit destructor calls, improving code clarity and safety.

-#898: Fix edge-case in zeta for large inputs to prevent NaNs and ensure scipy compatibility.

-#896: Removed ComputeCpp-specific code from SYCL Vptr to enhance compatibility and performance.

-#901: Fix construct_at compilation breakage on ROCm, improving compatibility with HIP environments.

-#900: Fix swap test for size 1 inputs to enhance test reliability.

-#903: Convert bit calculation to constexpr, avoiding casts for readability and maintainability.

-#907: Enhanced PowerPC MMA flags with dynamic dispatch, improving compatibility and performance.

-#909: Removed outdated GCC-4 warning workarounds for a cleaner codebase.

-#829: Streamline codebase by replacing Eigen type metaprogramming with std types and alias templates.

-#914: Disabled Schur non-convergence test to improve reliability.

-#913: Enhanced PowerPC MMA flag handling for default builds.

-#915: Fix missing pound directive to prevent compilation errors.

-#917: Implement workaround for g++-10 docker optimization issue in geo_orthomethods_4.

-#911: Fix RowMajorBit <-> RowMajor mixup to enhance code robustness.

-#916: Updated EIGEN_ALTIVEC flags for compatibility with TensorFlow, allowing binary values and enhancing documentation.

-#923: Fix AVX512 builds with MSVC; improved compatibility and added comprehensive testing.

-#925: Fix ODR violation in trsm by marking functions as inline.

-#926: Fixed namespace usage to resolve compilation errors and enhance code stability.

-#927: Update warning suppression methods to enhance compatibility with newer compilers.

-#921: Optimize visitor traversal for RowMajor matrices to enhance performance by adjusting traversal methods to data layout.

-#918: Added missing explicit reinterprets to resolve g++ build errors.

-#930: Added a missing typename and fixed an unused typedef warning for better GCC 9 compatibility.

-#931: Re-enabled Aarch64 CI pipelines to enhance testing and validation on Aarch64 architecture.

-#892: Added is_constant_evaluated support and improved alignment checks.

-#937: Eliminated warnings related to unused trace statements for cleaner compilation.

-#924: Disable f16c scalar conversions for MSVC to enhance compatibility.

-#939: Removed .cpp file inclusions in LAPACK for better code clarity.

-#934: Fixed order of arguments in BLAS SYRK to resolve compilation errors and enhance code correctness.

-#941: Modified test_isApprox function to handle inf/nan comparisons correctly.

-#940: Reintroduced std::remove* aliases to restore compatibility for third-party libraries.

-#854: Added Scaling function overload for vector rvalue reference to improve usability and correctness.

-#904: Transformed static const members into constexpr for increased performance and optimization.

-#943: Enhanced compile-time evaluations by converting helper functions to constexpr in XprHelper.h.

-#944: Transitioned a metaprogramming utility to a constexpr function for improved compile-time evaluation and code usability.

-#942: Fixed navbar scroll issues by overriding Doxygen's initResizable() and adjusting TOC positioning.

-#945: Fixes max size expressions to ensure correct calculations and intended behavior.

-#949: Fix ODR issues in lapacke_helpers to enhance reliability and stability.

-#946: Removed legacy macro EIGEN_EMPTY_STRUCT_CTOR for improved code simplicity and maintainability.

-#953: Fixed ambiguous DiagonalMatrix constructors by clarifying initializer list usage.

-#951: Fix Power GEMV order of operations in predux for MMA, optimizing performance and fixing GCC assembly issues.

-#952: Allow all tests to pass with EIGEN_TEST_NO_EXPLICIT_VECTORIZATION to improve test stability.

-#962: Improved memory handling and performance in HouseholderSequence by eliminating unnecessary heap allocations and streamlining block logic.

-#963: Fix cwise NaN propagation for scalar input.

-#964: Fix compilation issue in HouseholderSequence.h related to InnerPanel template.

-#958: Fix compiler bugs for GCC 10 & 11 for Power GEMM to enhance compatibility and stability.

-#966: Removed need to supply the Symmetric flag to UpLo argument for Accelerate LLT and LDLT, simplifying solver usage.

-#967: Added load vector_pairs for GEMM MMA RHS and improved predux GEMV.

-#968: Made diagonal matrix cols() and rows() methods constexpr for enhanced compile-time evaluation.

-#969: Add uninstall target only if not already defined to enhance compatibility.

-#908: Fixed incorrect reference code in STL_interface.hh for ata_product to enhance reliability.

-#974: Prevent BDCSVD crash caused by index out of bounds.

-#977: Fixes BDCSVD numerical instability, enhancing robustness and reliability.

-#984: Unset the executable flag on specified files to improve file permission management.

-#985: Improve plogical_shift_* implementations and fix typo in SVE/PacketMath.h.

-#980: Avoid signed integer overflow in adjoint test to enhance reliability.

-#975: Introduces subMappers for Power GEMM packing, enhancing performance by 10%.

-#982: Avoid ambiguous Tensor comparison operators for C++20 compatibility.

-#986: Updated SYCL-2020 range handling to ensure compliance and improved parallel operation reliability.

-#976: Fixes incorrect LDLT results when using AutoDiffScalar with value 0, ensuring proper derivative handling.

-#971: Introduces R-Bidiagonalization in BDCSVD for improved performance on large matrices.

-#987: Fix integer shortening warnings in visitor tests.

-#989: Fix C++20 ambiguity of comparisons, enhancing clarity and compliance.

-#990: Introduces diagonal matrix multiplication and static initializers for zero and identity matrices.

-#991: Addressed ambiguous comparison warnings in C++20, improving TensorBase comparison operations.

-#993: Fix row vs column vector typo in Matrix class tutorial for improved clarity.

-#994: Marked index_remap as EIGEN_DEVICE_FUNC to enhance GPU utilization in Eigen's reshaping functionality.

-#995: Added documentation for DiagonalBase to enhance clarity and usability.

-#996: SYCL-Spec compliance for kernel names to enhance compatibility with SYCL-2020 specifications.

-#999: Use numext::sqrt in Householder.h to simplify custom type integration.

-#1003: Eliminate undef warnings when not compiling for AVX512 to enhance code stability.

-#1001: Skip f16/bf16 bessel specializations on AVX512 if unavailable, enhancing portability and reducing build errors.

-#1002: Fix clang-tidy warnings and reformat code for improved readability.

-#947: Introduced partial loading and storing operations to enhance memory access and performance.

-#1005: Re-enable unit tests for device side malloc after ROCm 5.2 fix.

-#1007: Fix ODR violations by replacing unnamed type with named type, enhancing stability and clarity.

-#1006: Ensure AutoDiff module includes Core dependency for better consistency.

-#1009: Corrected Doxygen group usage to enhance documentation clarity.

-#1013: Add option to disable AVX512 GEBP kernels.

-#1014: Fix aligned_realloc to call check_that_malloc_is_allowed() if ptr == 0, enhancing memory management integrity.

-#1015: Disable AVX512 GEMM kernels by default to enhance stability and prevent segmentation faults.

-#1016: Improved Emscripten compatibility by including immintrin.h header.

-#1021: Updated documentation for AccelerateSupport after PR 966.

-#1019: Avoid including with EIGEN_NO_IO to enhance compatibility with embedded systems.

-#1020: Enhanced ConjugateGradient to use numext::sqrt for compatibility with custom numeric types.

-#1023: Fix flaky packetmath_1 test for increased reliability.

-#1010: Fix inner iterator for sparse block to enhance reliability of sparse matrix operations.

-#1025: Fix use of Packet2d type for non-VSX to enhance portability and usability.

-#1028: Fix non-VSX PowerPC build to enhance compatibility and usability.

-#1027: Fix code and unit test for corner cases in vectorized pow() function.

-#1012: Fix vectorized Jacobi Rotation to utilize packet math vectorized version and ensure 'fixed-size' code path passes tests.

-#1026: Vectorize the sign operator to enhance performance for real types.

-#1030: Resolve compilation errors by avoiding double definitions of Half functions on aarch64 during GPU compilation.

-#1031: Eliminated bool bitwise warnings to improve code clarity and maintainability.

-#1032: Disable invalid deprecation warnings in BDCSVD class.

-#1033: Fix and enhance accuracy of SYCL tests and tensor operations.

-#1035: Removed unnecessary FP16C checks for AVX512 to enhance performance.

-#1034: Improved pow<double> performance by 11-15% with a new division algorithm.

-#1037: Protect new pblend implementation with EIGEN_VECTORIZE_AVX2 to enhance robustness and compatibility.

-#1039: Fixes psign for unsigned integer types, enhancing robustness and correctness.

-#1044: Added missing pointer in realloc call to improve memory management.

-#1045: Enhanced GeneralizedEigenSolver::info() reliability and error clarity.

-#1042: Addressed undefined behavior in array_cwise test caused by signed integer overflow.

-#1046: Re-enable pow function for complex types, enhancing mathematical operations in the Eigen library.

-#1043: Vectorize pow for integer base / exponent types to improve performance and robustness.

-#1038: Vectorize acos, asin, and atan for float with significant accuracy and performance enhancements.

-#1048: Fix test build errors related to new unary power functionality, improving compatibility and flexibility.

-#1049: Fixes two typos in the slicing tutorial documentation.

-#1051: Updated mixingtypes tests to accommodate changes in unary pow operation.

-#1052: Fixes CMake issues by adjusting benchmark builds and handling test dependencies.

-#1050: Add asserts for index-out-of-bounds in IndexedView to enhance error checking and prevent runtime errors.

-#1053: Fixed MSVC compilation error in GeneralizedEigenSolver.h by adding missing semi-colon.

-#1056: Reduce compiler warnings for tests, leading to cleaner build output.

-#1057: Adjusted overflow threshold bounds for pow function tests to enhance CI pipeline reliability.

-#899: Introduced C++14 constexpr support for Maps and basic operations, enhancing compile-time capabilities.

-#1061: Tweak bound for pow to account for floating-point types, improving reliability and fixing specific failures.

-#1060: Fix realloc for non-trivial types to enhance stability in memory handling.

-#1064: Fix g++-6 constexpr and C++20 build errors for better compliance.

-#1063: Address issues with unary pow() to enhance type safety and correctness.

-#1069: Removed faulty skew_symmetric_matrix3 test to improve test robustness and mitigate potential msan errors.

-#1066: Allow mixed types for pow() if exponent is exactly representable in base type.

-#1070: Fix test for pow with mixed integer types to prevent unintended conversions.

-#1077: Addressed unused-result warning in ROCm integration related to gpuGetDevice.

-#1078: Add macro to optimize GEBP kernel for NEON architecture.

-#1080: Remove unused typedef to enhance code clarity and maintainability.

-#1079: Reduce compilation time/memory for GEBP kernel using EIGEN_IF_CONSTEXPR.

-#1083: Reduces memory footprint of GEBP kernel for non-ARM targets to improve MSVC build performance.

-#1084: Vectorize atan() for double to enhance performance and accuracy.

-#1085: Fix 4x4 inverse issues when using -Ofast compilation.

-#1088: Replaced assert with eigen_assert for consistency and configurability.

-#1089: Unconditionally enable CXX11 math across the Eigen library to enhance compatibility and consistency.

-#1087: Introduced a refined range reduction strategy for atan<float>() improving performance by 20-40% on x86 architectures.

-#1091: Enhance AttributeMacros with new macros for better clang-format compatibility.

-#1092: Removed references to M_PI_2 and M_PI_4 to improve code clarity and portability.

-#1093: Enhanced handling of NaN inputs in atan2 function to improve reliability.

-#1094: Fix warnings -Wunused-but-set-variable in Eigen/Sparse for improved code cleanliness.

-#1095: Refactor special values test for pow and add similar test for atan2 to enhance coverage.

-#1096: Fix bug in atan2 function for better cross-platform compatibility.

-#1099: Clarified that indices must be sorted in documentation.

-#1101: Modify memory functions to use 1-byte offset for improved alignment handling.

-#1105: Fix pragma check for disabling fastmath to enhance reliability and numerical stability.

-#1102: Added assert to validate outer index array size in SparseMapBase.

-#1106: Fixes offset computation in handmade_aligned_malloc to reduce compiler warnings and improve memory safety.

-#1107: Disabled patan for double precision on PPC to fix build issues.

-#1100: Enabled resizing of dynamic empty matrices to enhance flexibility and accuracy.

-#1110: Remove unused parameter name to enhance code readability and maintainability.

-#1109: Removed an unnecessary assert in SparseMapBase to allow flexibility in sparse matrix population.

-#1112: Fixes a typo in CholmodSupport for improved code readability.

-#1118: Fix ambiguity in PPC for vec_splats call by clarifying type usage.

-#1119: Implement bracket notation for unsigned type names to enhance code clarity and consistency.

-#1116: Corrected handling of floating-point zero in pnegate function by directly flipping the sign bit.

-#1113: Fix duplicate execution code for Power 8 Altivec in pstore_partial.

-#1117: Minor improvements to IDRS.h for code cleanliness and readability.

-#1120: Fix bug in handmade_aligned_realloc to enhance memory management and prevent undefined behavior.

-#1121: Add serialization for sparse matrix and sparse vector.

-#1122: Fix compiler warnings in test files to improve code quality and maintainability.

-#1125: Introduce synchronize method to all devices to enhance flexibility and testing.

-#1124: Fix sparseLU solver to handle destinations with non-unit stride.

-#1114: Modified BiCGSTAB parameters initialization to support custom types.

-#1123: Fix reshape strides when input has non-zero inner stride.

-#1127: Fix serialization and enhance robustness for non-compressed matrices.

-#1130: Fix index type for sparse index sorting.

-#1128: Enable direct access for NestByValue to enhance performance.

-#1134: Optimize equalspace packet operation for improved performance and efficiency.

-#1090: Allow std::initializer_list constructors in constexpr expressions for enhanced usability and compatibility with modern C++.

-#1135: Enhanced compatibility by removing std::raise() dependency for handling divide by zero.

-#1137: Replaced std::signbit with numext::signbit for bfloat16 compatibility.

-#1138: Update test framework for numext::signbit to enhance reliability and accuracy.

-#1139: Added operators to CompressedStorageIterator to enhance functionality and usability.

-#1144: Fix up C++ version detection macros and cmake tests to enhance compatibility and stabilize CI.

-#1145: Adjust thresholds for bfloat16 product tests to enhance reliability.

-#1140: Updated SparseLU for enhanced compatibility and fixed initialization bug.

-#1149: Fix git add . to include scripts/buildtests.in by modifying .gitignore.

-#1151: Fix EIGEN_HAS_CXX17_OVERALIGN for icc to enhance compatibility with Intel C++ Compiler.

-#1155: Fix overalign check to enhance compiler compatibility.

-#1156: Fix minor build and test issues for better reliability and performance.

-#1158: Clarified spbenchsolver help message to improve naming conventions for SPD matrices and rhs files.

-#1147: Overhauled Sparse Core to enhance performance and maintainability of sparse matrix operations.

-#1159: Re-introduced missing header for GPU tests to restore functionality.

-#1161: Fix compilation error due to unused parameter 'tmp' on clang/32-bit ARM.

-#1160: Improved insert strategy for compressed sparse matrices to enhance performance and reduce reallocations.

-#1162: Rollback QR changes to fix build error related to StorageIndex conflicts.

-#1167: Avoid move assignment in ColPivHouseholderQR to enhance stability and compatibility with compilers.

-#1165: Added missing EIGEN_DEVICE_FUNC in assertions and improved code robustness.

-#1164: Enhance performance of sparse permutations by reducing memory allocations and optimizing data handling.

-#1168: Introduce thread-local storage for is_malloc_allowed() to enhance safety in multi-threaded applications.

-#1136: Review and cleanup of compiler version checks to enhance readability and maintainability.

-#1169: Replace deprecated $<CONFIGURATION> with $<CONFIG> for CMake compliance.

-#1166: Introduces custom ODR-safe assert to enhance C++20 module compatibility.

-#1170: Enhanced sparse matrix insertion and memory management, improving performance and efficiency.

-#1172: Refactored SparseMatrix for improved code consistency and readability.

-#1175: Improved corner case handling and efficiency of atan2 function, added to numext namespace, and fixed a bug in tests.

-#1179: Disabled vectorized rsqrt to ensure consistency with generic version.

-#1181: Fix bugs exposed by enabling GPU asserts and enhance GPU computation reliability.

-#1178: Fix sparse warnings to enhance code stability and reliability.

-#1176: Optimize mathematical packet operations for better accuracy and performance.

-#1180: Fixed critical sparse bugs with outerSize == 0 to enhance stability and prevent segmentation faults.

-#1183: Fix undefined behavior in Block access, eliminating UBSan errors.

-#1185: Enhanced robustness of the atan2 function for compatibility with TensorFlow using Clang.

-#1186: Updated ForwardDeclarations.h for improved clarity and maintainability.

-#1188: Reverted StlIterators edit to address concerns about undefined behavior.

-#1190: Use VERIFY_IS_EQUAL for zero comparisons to enhance code consistency.

-#1191: Improved LAPACKE configuration for better compatibility and complex type management.

-#1192: Enhanced EIGEN_DEVICE_FUNC compatibility for CUDA 10/11/12 and cleaned up warnings.

-#1189: Improved compatibility of SkewSymmetric<?> with CUDA by adding EIGEN_DEVICE_FUNC qualifiers.

-#1198: Optimized Power module by replacing eigen_asserts with eigen_internal_asserts to reduce runtime overhead in release builds.

-#1197: Remove LGPL code to ensure MPL2 compatibility and simplify licensing.

-#1200: Removed custom implementations of equal_to and not_equal_no, leveraging C++14 capabilities.

-#1199: Add IWYU export pragmas to top-level headers to enhance compatibility with tooling like clang-tidy.

-#1206: Update ColPivHouseholderQR_LAPACKE.h to enhance type handling for LAPACK operations involving complex numbers.

-#1201: Fix ODR violation with gemm_extra_cols on PPC to prevent crashes and enhance stability.

-#1208: Revert ODR changes and inline gemm functions to improve efficiency.

-#1209: Introduced functionality to print diagonal matrix expressions directly, enhancing debugging and efficiency.

-#1212: Disable array BF16 to F32 conversions in Power architecture to enhance stability and efficiency.

-#1213: Fix compiler warnings to enhance code quality and maintainability.

-#1215: Fix compiler warnings in tests to enhance code stability and maintainability.

-#1216: Fix typo in NEON make_packet2f return value to enhance correctness.

-#1218: Implements a correction in MSVC's atan2 for consistency with the POSIX spec.

-#1220: Resolved GCC compile issues and fixed preinterpret stack overflow in NEON packetmath.

-#1219: Optimizations for pasin_float and error handling fixes for psqrt_complex.

-#1221: Guard complex sqrt on old MSVC compilers to enhance compatibility.

-#1222: Fix epsilon value in long double for double doubles to improve algorithm convergence on PPC.

-#1223: Vectorize atanh, add atan definition, and unit tests for atan.

-#1226: Use pmsub in twoprod to improve pow() performance by ~1% on Skylake.

-#1229: Fix MSAN failures in SVD tests by initializing matrix entries to improve test robustness.

-#1230: Removed EIGEN_HAS_AVX512_MATH workaround to simplify code and improve compatibility.

-#1228: Enhanced compatibility on Power architecture by fixing vec_div issues across compiler versions.

-#1196: Enhance vectorized comparisons with typed comparison support for performance boost.

-#1239: Improve test reliability for NEON integer shift operations by handling zero argument cases.

-#1243: Fixes tensor comparison test to ensure accurate results.

-#1242: Improves memory allocation efficiency during tridiagonalization in eigenvector computation.

-#1233: Vectorize any() and all() in DenseBase, enhancing performance and flexibility for large matrix operations.

-#1241: Ensure CMAKE_* cache variables are set only for top-level projects to prevent build setting modifications.

-#1245: Modify failing cwise test to ensure it passes by using .abs() to avoid overflow issues.

-#1248: Fix typo in LinAlgSVD example code to ensure successful compilation and correct output.

-#1250: Replaced 'Lesser' with 'Less' for consistency and clarity.

-#1251: Added a newline to end of file to align with coding standards.

-#1252: Work around compiler bug in Tridiagonalization.h to enhance robustness.

-#1254: Make Select implementation backwards compatible to ensure stability with older versions.

-#1256: Fix bug in minmax_coeff_visitor for matrix of all NaNs to enhance robustness.

-#1257: Align handling of PropagateFast with PropagateNaN in minmax visitor.

-#1259: Reintroduced and added deadcode checks to enhance code quality.

-#1262: Limit build and link jobs for PowerPC to reduce OOM issues.

-#1263: Fix recent PowerPC warnings and clang warning.

-#1264: Introduced EIGEN_NOT_A_MACRO to enhance compatibility with TensorFlow and avoid build issues.

-#1265: Vectorize tensor.isnan() using typed predicates for improved AVX512 performance.

-#1266: Removed pool functionalities for CMake versions less than 3.11 to streamline build processes.

-#1268: Enhanced compatibility with CMake list handling for command-line argument parsing.

-#1267: Fixed various typographical errors to enhance code readability and professionalism.

-#1269: Reverted CMake pools changes to stabilize the build process by eliminating related errors.

-#1271: Enhanced SparseMatrix with updated Map typedef and improved overflow checks in setFromTriplets.

-#1273: Replaced internal::(U)IntPtr with std::(u)intptr_t and removed ICC workaround to enhance compatibility and code clarity.

-#1234: Removed unused BLAS/LAPACK declarations to enhance maintainability and reduce signature conflicts.

-#1276: Optimized generic_rsqrt_newton_step for enhanced accuracy and performance.

-#1279: Refactor indexed view methods to enable non-const reference access with symbolic indices, improving usability and maintainability.

-#1283: Use correct truncating intrinsic for double-to-int casting to improve accuracy.

-#1148: Guarded malloc, realloc, and free() with check_that_malloc_is_allowed() and improved error handling by replacing abort with exceptions.

-#1284: Cleanup and enhance packet math by removing unused components and adding missing specializations.

-#1286: Improve handling of non-const symbolic indexed views by checking for l-value-ness.

-#1287: Prevent crash on empty tensor contraction by omitting assert and returning nullptr for size 0 allocations.

-#1288: Updated documentation for Eigen 3.4.x to resolve build errors and enhance clarity.

-#1291: Ensure Eigen/Core and Eigen/src/Core are not ignored due to core rule on Windows.

-#1294: Improve accuracy of erf() with refined rational approximation and better clamping methods.

-#1295: Refactor IndexedView to enhance readability and maintainability by reducing SFINAE verbosity.

-#1299: Introduces BF16 pcast functions and reorganizes type casting in TypeCasting.h.

-#1298: Improved tensor select evaluator performance using select ternary op for enhanced efficiency.

-#1303: Ensure Erf() returns +/-1 above clamping and enhance performance, particularly for AVX2 on Skylake.

-#1306: Removed last occurrences of the unused enum HasHalfPacket for codebase cleanliness.

-#1308: Fix pow for uint32_t, disable pmul to enhance robustness and prevent compilation issues.

-#1309: Introduces the Abs2 function for Packet4ul, enhancing functionality.

-#1312: Fixed boolean bitwise and warning in test code.

-#1311: Fixed sparse iterator compatibility issues and deprecated function warnings on macOS using Clang.

-#1304: Specializes evaluator for scalar_cast_op to optimize handling of different packet types.

-#1316: Implemented pcmp, pmin, and pmax for Packet4ui, improving compliance and test stability.

-#1305: Enhance StridedLinearBufferCopy with half-Packet operations for improved performance.

-#1318: Set m_nonzeroSingularValues to zero when input is not finite to enhance stability.

-#1319: Fix ColMajor BF16 GEMV for mixed RowMajor vector compatibility.

-#1321: Cleaned up array_cwise test by suppressing warnings, resolving ambiguities, and removing redundant tests.

-#1322: Corrected loadColData implementation to fix BF16 GEMV compatibility with LLVM.

-#1323: Fix modulo by zero compiler warning to enhance code robustness.

-#1325: Renamed array_cwise test and suppressed compiler warnings to enhance clarity and reduce message noise.

-#1289: Moved thread pool code from Tensor to Core to enhance accessibility for future developments.

-#1324: Update ndtri to return NaN for out-of-range inputs, ensuring consistency with SciPy and MATLAB.

-#1329: Introduce macros to override synchronization primitives in Eigen ThreadPool for customization.

-#1333: Fix compiler warnings and failures in JacobiSVD and BDCSVD by initializing matrix members.

-#1334: Fix unrolled assignment evaluator to enhance access patterns for small fixed-size arrays and matrices.

-#1335: Introduced functions for adding/removing outer vectors in SparseMatrix for enhanced structure management.

-#1336: Introduces linear redux evaluators to enhance expression evaluation performance.

-#1337: Clean up Redux.h and fix vectorization_logic test after traversal order changes.

-#1339: Adjusted EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC for CUDA to resolve compilation issues.

-#1338: Optimized error handling in scalar_unary_pow_op for better performance and robustness.

-#1342: Reduce max relative error of prsqrt from 3 to 2 ulps.

-#1343: Improved error handling and testing for unary power functions in Eigen.

-#1344: Enhanced numerical stability in prsqrt function to prevent underflow.

-#1328: Partially vectorizes cast for enhanced performance and safety in vectorized operations.

-#1347: Introduced compile- and run-time assertions for Ref<const> construction to enhance memory layout safety.

-#1346: Introduce a move constructor for Ref<const...> to enhance performance by reducing unnecessary copying.

-#1351: Streamlined SVD testing to improve CI stability and reduce resource consumption.

-#1350: Improved safe_abs function in int_pow for better Clang compatibility.

-#1353: Removed deprecated function calls in SVD test to improve maintainability.

-#1352: Enhanced precision and performance of rint, round, floor, and ceil functions.

-#1354: Add optional offset parameter to ploadu_partial and pstoreu_partial for API consistency.

-#1355: Disabled FP16 arithmetic for arm32 to enhance stability and compatibility with Clang compiler limitations.

-#1345: Add Quaternion constructor from real scalar and imaginary vector to simplify common expressions.

-#1360: Fix ivcSize return type for enhanced type consistency and reliability.

-#1362: Fix argument for _mm256_cvtps_ph imm parameter to eliminate MSVC warning C4556.

-#1361: Fixed Altivec compilation with C++20 and higher by addressing simple-template-id issues.

-#1363: Fix use of arg function in CUDA for improved compatibility with MSVC and C++20.

-#1358: Addressed various compiler warnings to improve code stability and readability.

-#1364: Optimize check_rows_cols_for_overflow for better compile-time performance with matrix size checks.

-#1367: Fix gcc warnings by addressing subtle bugs and improving code clarity.

-#1369: Fix ARM build warnings by improving type casting and variable shadowing.

-#1370: Fixes -Waggressive-loop-optimizations warning for better compilation with gcc 10+.

-#1371: Fix -Wmaybe-uninitialized warning in SVD by initializing dimensions correctly.

-#1373: Added max_digits10 function to enhance decimal digit representation and improve floating-point serialization.

-#1372: Fix compatibility issues with Tensorflow on Power architecture, enhancing performance and reliability.

-#1376: Fix nullptr dereference issue in triangular product for zero-sized matrices.

-#1331: Add test to validate SYCL in Eigen core enhancing compatibility.

-#1377: Fix to prevent undefined behavior in triangular solves with empty systems.

-#1378: Fix clang-tidy warning by replacing std::move() with std::forward() for better handling of lvalues and rvalues.

-#1379: Fix nullptr dereference in SVD to enhance robustness and prevent runtime errors.

-#1380: Fixes undefined behavior by ensuring proper memory alignment for scalars, enhancing library stability.

-#1381: Updates boost MP test suite to reference new SVD tests for improved reliability.

-#1382: Fix tensor stridedlinearbuffercopy by preventing negative indices and enhancing robustness.

-#1383: Introduces a temporary macro for handling unaligned scalar UB to address TFLite-related issues.

-#1384: Add IWYU private pragmas to internal headers to enhance tooling capabilities.

-#1385: Rename plugin headers to .inc to improve management and usability.

-#1387: Introduced method for handling block expressions, improving block unwinding and compatibility.

-#1388: Ensure stage is not 'ok' if Pardiso returns an error, improving error handling.

-#1389: Introduces new panel modes for GEMM MMA enhancing performance for real and complex matrices.

-#1391: Exported ThreadPool symbols to silence Clang include-cleaner warnings.

-#1394: Fix extra semicolon in XprHelper to resolve compilation error with -Wextra-semi flag.

-#1396: Fixes longstanding bug in sparse triangular view iterator by restoring row() and col() functions.

-#1397: Consolidated multiple implementations of divup/div_up/div_ceil to streamline code maintenance and clarity.

-#1398: Resolved compile errors by eliminating use of _res.

-#1400: Modify div_ceil to pass arguments by value, reducing odr-usage errors.

-#1402: Work around MSVC issue in Block XprType, enhancing compatibility.

-#1404: Avoid building docs if cross-compiling or not top level to streamline the build process.

-#1399: Disable denorm deprecation warnings in MSVC C++23 for cleaner build output.

-#1406: Replaced divup with div_ceil in TensorReduction to remove deprecation warnings.

-#1407: Fix Wshorten-64-to-32 warnings in div_ceil to enhance code robustness.

-#1410: Fix int overflow in cxx11_tensor_gpu_1 test using DenseIndex.

-#1411: Fix typo to allow nomalloc test to pass on AVX512.

-#1412: Backport fix for disambiguating overloads with empty index lists to address compilation errors.

-#1408: Generalize parallel GEMM to work with ThreadPool in addition to OpenMP.

-#1413: Optimized traits<Ref>::match to use correct strides for performance enhancement and memory management.

-#1415: Linked pthread for product_threaded test to ensure successful execution of multi-threaded tests.

-#1416: Fix Wshorten-64-to-32 warning in gemm parallelizer for improved code quality.

-#1417: Fixed bug in getNbThreads() to return 1 when not parallelized.

-#1421: Gemv microoptimization improves loop performance and reduces compile-time warnings.

-#1422: Fix conversion of (u)int64_t to float on ARM to prevent data loss.

-#1424: Optimized matrix-vector operations in GeneralMatrixVector.h for improved performance.

-#1423: Introduce static assertions in Tensor constructors to ensure matching dimensions.

-#1425: Fix typecasting for arm32 to restore functionality and compatibility.

-#1419: Ensure that mc is not smaller than Traits::nr to prevent potential errors in calculations.

-#1429: Applied clang-format for consistent coding style across the codebase.

-#1430: Added .git-blame-ignore-revs file to improve git blame clarity.

-#1431: Fix scalar_logistic_function overflow for complex inputs to enhance robustness and accuracy.

-#1432: Applied clang-format-17 across the library to improve code consistency and readability.

-#1433: Improved formatting for .git-blame-ignore-revs to enhance clarity and usability.

-#1434: Fix CUDA syntax error introduced by clang-format to enhance code quality.

-#1435: Protect kernel launch syntax from clang-format to prevent syntax errors.

-#1436: Add internal ctz/clz implementation for enhanced random number generation and pointer alignment checking.

-#1439: Fix MSVC clz to correct leading zero count functionality.

-#1428: Set up clang-format in CI to ensure consistent code formatting.

-#1441: Fixed clang-format CI to run in non-interactive mode and ensured proper installation.

-#1446: Remove C++11 features from ctz/clz to restore compatibility with earlier C++ standards.

-#1409: Addressed compiler warnings and fixed significant bugs in Memory.h.

-#1448: Addressed MSAN failures by ensuring matrices are initialized, improving memory operation reliability.

-#1449: Enhanced memory safety by replacing function pointers with lambdas in GPU code with Clang and asan.

-#1447: Fix various asan errors to enhance stability and reliability by addressing memory management issues.

-#1450: Simplified stableNorm to suppress GCC warnings and improve efficiency.

-#1445: Add factor getters to Cholmod LLT/LDLT for enhanced solver functionality.

-#1438: Improve documentation of SparseLU module for enhanced user comprehension.

-#1453: Fixes TensorForcedEval copying issues to prevent memory management errors.

-#1456: Ensure pointers are checked before being freed to enhance memory safety.

-#1458: Fixes stableNorm to handle zero-sized inputs, enhancing robustness.

-#1443: Update CI with testing framework from eigen_ci_cross_testing to enhance testing processes.

-#1459: Add missing constexpr qualifier to enhance compile-time evaluation capabilities.

-#1457: Added assertions for .chip to enhance robustness and error handling.

-#1460: Reverted cleanup of stableNorm to restore performance for large vectors.

-#1461: Fix unused warnings in failtest to enhance code quality and developer experience.

-#1444: Use smaller index types to enhance robustness during resize operations.

-#1451: Fix build error due to Index/StorageIndex mismatch in SPQR::compute().

-#1466: Implements and refines assertions for dimension indices in chipping operations.

-#1454: Add half and quarter vector support to HVX architecture for improved performance.

-#1467: Fix compile-time error and enhance error detection with static asserts.

-#1469: Enhanced C++ standards compliance by removing explicit specialization, improving compatibility with gcc and MSVC.

-#1470: Various formatting improvements for better code readability and consistency.

-#1471: Updated LAPACK CPU time functions for consistency with standard naming conventions.

-#1477: Removed an obsolete relicense script to streamline the codebase.

-#1473: Update documentation for LAPACK's second and dsecnd functions to improve clarity and usability.

-#1478: Fix bug in checking subnormals to enhance accuracy in numerical operations.

-#1481: Fix CI for clang-6 cross-compilation ensuring consistent GLIBC versions.

-#1479: Fix busted formatting in Eigen::Tensor README.md.

-#1483: Use stableNorm in ComplexEigenSolver for improved result stability.

-#1482: Fix preshear transformation to restore proper functionality and add test.

-#1476: Fix a bunch of ODR violations to enhance code clarity and consistency.

-#1437: Enhance random number generation to improve entropy for 64-bit scalars.

-#1486: Fix gcc-6 bug in the rand test by adding noinline attribute to ensure correct behavior.

-#1487: Improve skew-symmetric test reliability by excluding problematic dimensions.

-#1485: Enhanced robustness of PPC testing by removing constraints on random integer generation and fixing overflow issues.

-#1488: Fix tests for bfloat16 and half scalar types, improving test reliability.

-#1489: Fix undefined behavior in getRandomBits to improve code safety and reliability.

-#1490: Fix UB in bool packetmath test by ensuring valid boolean standards and enhancing reliability.

-#1494: Fix segfault in CholmodBase::factorize() for zero matrix to enhance stability.

-#1492: Fix C++20 error related to arithmetic between different enumeration types.

-#1491: Applied clang-format to lapack/blas directories for code consistency.

-#1496: Fix division by zero UB in packet size logic.

-#1499: Eliminated warning about writing bytes directly to non-trivial type, enhancing code clarity and reducing compiler warnings.

-#1500: Implements explicit scalar conversion in ternary expressions, enhancing type safety and correctness.

-#1504: Fixes undefined behavior in pabsdiff for ARM with latest compilers, enhancing stability.

-#1507: Fix deflation in BDCSVD, enhancing stability and correctness for large matrices.

-#1506: Use traits::Options for consistency across Eigen objects.

-#1503: Fix random for custom scalars without constexpr digits() to enhance compatibility.

-#1509: Renamed generic_fast_tanh_float to ptanh_float and improved code readability.

-#1501: Introduced SIMD complex function pexp_complex for float, enhancing performance and compatibility for complex number operations.

-#1513: Fix pexp_complex_test for compliance with C++ standard.

-#1514: Fix exp complex test by using int instead of index to improve correctness and clarity.

-#1510: Enhance real Schur decomposition robustness and improve polynomial solver error checking.

-#1516: Fix GPU build for ptanh_float.

-#1517: Fix use of uninitialized memory in kronecker_product test.

-#1511: Enabled direct access for IndexedView with data method and strides for improved performance.

-#1518: Standardized header guards in key files to fix a build error.

-#1521: Fix crash in IncompleteCholesky when the input has zeros on the diagonal.

-#1520: Removed 'using namespace Eigen' from blas/common.h to prevent symbol collisions.

-#1519: Change array_size result from enum to constexpr for enhanced type safety and reduced compiler warnings.

-#1523: Speed up SparseQR improving execution time from 256 to 200 seconds.

-#1524: Fixed signed integer overflows in random number generation for improved stability.

-#1525: Speed up sparse x dense dot product, reducing computation time significantly.

-#1527: Delete shadowed typedefs to improve code clarity and maintainability.

-#1528: Fix QR colpivoting warnings and test failure by using numext::abs for floating-point types.

-#1531: Add degenerate checks before calling BLAS routines to handle zero-sized matrices or vectors safely.

-#1532: Updated error message to clarify the C++14 requirement, enhancing clarity for users.

-#1530: Eliminated FindCUDA CMake warning to enhance the build process.

-#1529: Fix triangular matrix-vector multiply uninitialized warning by removing const_cast and clarifying logic.

-#932: Replaced make_coherent with CoherentPadOp for improved performance in handling derivative sizes.

-#1536: Fix unaligned access in trmv to enhance memory alignment handling.

-#1537: Fix static_assert for better C++14 compatibility.

-#1538: Return 0 volume for empty AlignedBox, fixing erroneous negative volume calculation.

-#1533: Enhanced test coverage for the pexp function, increasing reliability.

-#1535: Fix deprecated anonymous enum-enum conversion warnings to enhance code compliance with C++ standards.

-#1541: Fix packetmath plog test on Windows by switching to numext::log for improved accuracy.

-#1539: Allow aligned assignment in TRMV to simplify edge-case handling and enhance code stability.

-#1540: Fix pexp test for ARM to handle flushed subnormal values in 32-bit comparisons.

-#1542: Split up cxx11_tensor_gpu tests to reduce timeouts and improve test reliability on Windows.

-#1543: Fix and enhance incomplete Cholesky decomposition handling.

-#1545: Enhancements to CwiseUnaryView for improved access and modification of complex array components.

-#1547: Fix const input and C++20 compatibility in unary view.

-#1549: Improved CwiseUnaryView const access by adding matrix mutability checks.

-#1550: Improved error messaging for unsupported rbegin/rend on GPU.

-#1553: Restore C++03 compatibility by manually constructing 2x2 matrices.

-#1551: Work around a compile issue in VS2015 by using static_cast.

-#1552: Enhanced MSVC compatibility for CwiseUnaryView by reorganizing code.

-#1557: Improved documentation in the Jacobi module by adjusting tag placement for applyOnTheRight.

-#1558: Removed slow index check in Tensor::resize for performance improvements and modernized codebase.

-#1555: Enhanced Matrix functions with constexpr for compile-time evaluation.

-#1559: Fix compatibility of SIMD intrinsics for 32-bit builds.

-#1562: Enhanced compatibility and stability by protecting the use of alloca on 32-bit ARM systems.

-#1561: Removed unnecessary 'extern C' in CholmodSupport for code simplification.

-#1564: Introduced vectorization for cross3_product, improving performance and MSVC compatibility.

-#1563: Introduced custom formatting for complex numbers, enhancing Numpy/Native compatibility.

-#1566: Fix for Packet2l on Win32 enhances compatibility and reliability.

-#1568: Fix ScalarPrinter redefinition for gcc to enhance CI reliability.

-#1570: Use truncation rather than rounding when casting Packet2d to Packet2l to enhance accuracy.

-#1560: Implemented cwiseSquare function and fixed typo in cwiseCbrt, enhancing test coverage for matrix operations.

-#1565: Enhance compile-time expressions with symbols for efficient computations.

-#1571: Fix usages of Eigen::array for std::array compatibility.

-#1515: Fix random number generation for custom float types to enhance accuracy and minimize rounding bias.

-#1575: Improved handling of long double random number generation by refining fallback to double for unsupported configurations.

-#1577: Fix preverse for PowerPC to enhance compatibility and performance.

-#1576: Fixed preprocessor condition to restore fast float logistic implementation, enhancing performance and compatibility.

-#1578: Enhancements to Geometry_SIMD.h improve SIMD operations and compatibility.

-#1581: Add constexpr to accessors in DenseBase, Quaternions, and Translations to enhance compile-time functionality.

-#1583: Optimized pexp function for up to 6% faster performance.

-#1582: Refactor indexed view to fix warnings and errors in MSVC 14.16 build.

-#1573: Fix compiler warnings for MSVC and enhance code quality and cross-platform compatibility.

-#1584: Implements optimizations for Intel pblend, improving mask creation and comparison operations.

-#1522: Introduces SIMD implementation for double precision sincos, enhancing performance using Veltkamp method and Padé approximant.

-#1591: Fix compilation problems with PacketI on PowerPC for enhanced compatibility.

-#1590: Optimized pblend functionality with blend_mask_helper and enhanced auto vectorization for improved performance.

-#1593: Specialized evaluation for ternary operations to optimize (a < b).select(c, d) expression.

-#1594: Fix tridiagonalization_inplace_selector::run() for CUDA compatibility by adding EIGEN_DEVICE_FUNC.

-#1597: Fix autodiff enum comparison warnings to enhance code quality by reducing compilation warnings.

-#1596: Fix unused variable warnings in TensorIO to enhance code cleanliness and maintainability.

-#1598: Fix transposed matrix product bug to reduce unnecessary memory allocations.

-#1599: Prevent PPC runner from cross-compiling non-PPC targets to reduce build failures.

-#1601: Fixes sine and cosine functions on PPC by implementing a missing comparison function.

-#1602: Adjust error bound for nonlinear tests to account for AVX usage without FMA.

-#1604: Unbork AVX512 preduce_mul on MSVC for improved reliability and correctness.

-#1606: Fix undefined behavior for generating inputs to the predux_mul test to ensure reliable results.

-#1607: Relaxed hard-coded error bounds in nonlinear tests to enhance cross-platform reliability.

-#1605: Removed unnecessary semicolons to enhance code readability.

-#1493: Introduce trunc operation and improve code structure for SIMD operations.

-#1600: Optimizes transposed matrix products to reduce memory usage and improve performance.

-#1610: Fix new generic nearest integer ops on GPU for enhanced compatibility and performance.

-#1609: Enhanced reliability of orthonormality tests for eigenvectors by adjusting tolerance for matrix scaling.

-#1556: Reorganize CMake for better efficiency and usability in non-top-level builds.

-#1612: Introduces new bit shifting functionalities such as logical and arithmetic shift operators for integer types.

-#1613: Improved MSVC support by utilizing MSVC functions for 128-bit integer operations.

-#1614: Fix FFT when destination does not have unit stride by using a temporary buffer.

-#1611: Fix CMake package to correctly set include path for eigen target.

-#1615: Change predux on PowerPC for Packet4i to NOT saturate the sum of the elements for consistency.

-#1616: Fixed GCC 6 compile error by removing namespace prefixes from struct specializations.

-#1618: Fixed a clerical error in Matrix class documentation.

-#1621: Added checks for valid indices in SparseMatrix::insert to enhance robustness.

-#1622: Fix ubsan failure in array_for_matrix to enhance robustness and reduce undefined behavior.

-#1624: Improved performance by eliminating int to ptr casting in aligned_alloca.

-#1619: Suppress C++23 deprecation warnings for std::has_denorm and std::has_denorm_loss to enhance compatibility.

-#1625: Utilize __builtin_alloca_with_align for improved memory allocation efficiency.

-#1623: Reformatted EIGEN_STATIC_ASSERT() as a statement macro for consistency and maintainability.

-#1620: Fix compilation failures on constexpr matrices with GCC 14.

-#1628: Improve threading test reliability by adjusting header file inclusion and resolving C++20 warnings.

-#1629: Vectorize isfinite and isinf functions for improved performance.

-#1630: Fix warnings about repeated definitions of macros, enhancing code reliability.

-#1631: Suppressed GCC warnings on enum comparisons to enhance code quality.

-#1632: Vectorized allFinite() function achieving significant performance enhancement.

-#1633: Resolved warnings stemming from previous fixes, enhancing code quality and maintainability.

-#1635: Fixed warning C5054 by ensuring type-safe enum comparisons.

-#1636: Allow pointer_based_stl_iterator to conform to the contiguous_iterator concept in C++20, enhancing compatibility with modern C++ ranges.

-#1637: Fixed scalar pselect to handle NaN values consistently in MSVC's fast-math mode.

-#1644: Add async support for 'chip' and 'extract_volume_patches' with extensive testing.

-#1653: Corrected numerous typographical errors to enhance code clarity.

-#1649: Fix compiler warnings in BDCSVD by using placement new for object initialization.

-#1659: Updated .clang-format to support JavaScript files, improving formatting process.

-#1660: Updated eigen_navtree_hacks.js to enhance performance and usability.

-#1654: Introduced alignment macro to reduce atomic false sharing in RunQueue, enhancing multithreaded performance.

-#1656: Fixes multiple typos to enhance code readability and professionalism.

-#1645: Removed implicit 'this' capture in lambdas to prevent Clang warnings and enhance code clarity.

-#1658: Fix pi in kissfft to enhance accuracy in computations.

-#1651: Fixes and enhances conversion of Eigen::half to _Float16 in AVX512 to resolve compilation issues and improve code robustness.

-#1648: Fix Woverflow warnings in PacketMathFP16 with explicit short casts.

-#1661: Modified hlog to allow symbol lookup in local namespaces, improving function flexibility and consistency.

-#1650: Remove C++23 check around has_denorm deprecation suppression to prevent MSVC warnings.

-#1665: Cleanups to threaded product code and test for improved clarity and maintainability.

-#1666: Add a yield instruction in the two spinloops of the threaded matmul implementation to optimize CPU resource usage.

-#1668: Include for improved thread management with std::this_thread::yield().

-#1667: Speed up StableNorm for non-trivial sizes and improve consistency between aligned and unaligned inputs.

-#1670: Speed up and improve accuracy of tanh with new rational approximation.

-#1675: Add vectorized implementation of tanh to enhance performance across various ISAs.

-#1672: Vectorize squaredNorm() for complex types to enhance performance and efficiency.

-#1677: Consolidated float and double implementations of patan() for improved performance and accuracy.

-#1678: Suppress Wmaybe-uninitialized warning in TensorVolumePatchOp by handling unreachable code.

-#1676: Fixed documentation visibility for GeneralizedEigenSolver::eigenvectors() method.

-#1681: Enhanced complex number trait handling and added tests for pnmsub.

-#1679: Suppressed Wmaybe-uninitialized warning in BDCSVD for better memory safety and code reliability.

-#1680: Enhanced TensorChipping with detection of 'effectively inner/outer' chipping for better data loading optimization.

-#1682: Added nvc++ support by fixing ARM NEON compilation errors and improving test flag handling.

-#1684: Vectorize atanh, improve standard compliance and performance for |x| >= 1.

-#1685: Fix out-of-range arguments to _mm_permute_pd to enhance stability.

-#1688: Fix bug for atanh(-1) improving stability and accuracy.

-#1690: Fixes bug in previous atanh implementation, enhancing accuracy and reliability.

-#1691: Updated NonBlockingThreadPool.h to use eigen_plain_assert for enhanced compatibility with non-C++26 projects.

-#1671: Optimized dot products with new evaluator and explicit unrolling for improved performance.

-#1692: Optimize dot product for enhanced performance on smaller vector sizes.

-#1693: Fix generic ceil for SSE2 to handle negative numbers near zero correctly.

-#1694: Make fixed size matrices and arrays trivially copy and move constructible to enhance performance and compatibility.

-#1626: Refactor code to use constexpr for data() functions, enhancing compile-time evaluation and optimization.

-#1697: Removed unneeded call to _mm_setzero_si128 to address issue #2858.

-#1698: Fix implicit conversion in TensorChipping to enhance code reliability and prevent unexpected behavior.

-#1699: Fix warning in EigenSolver::pseudoEigenvalueMatrix() for improved robustness and compatibility.

-#1700: Added debugging info to float_pow_test_impl and cleaned up array_cwise tests.

-#1701: Add missing EIGEN_DEVICE_FUNC annotations to enhance CUDA compatibility.

-#1702: Added max_digits10 to NumTraits for mpreal types to enhance precision handling.

-#1703: Fix inverse evaluator for CUDA devices by marking as host+device function.

-#1706: Enhanced speed and accuracy of erf() function with reduced error and significant performance gains.

-#1707: Fixes bug to avoid NaN in erf(x) for large |x| with maintained speedup.

-#1709: Use ppolevl for polynomial evaluation to enhance maintainability and set the stage for future optimizations.

-#1708: Enhanced robustness of the atan test for 32-bit ARM platforms, resolving test failures.

-#1710: Vectorize erfc() for float to enhance accuracy and performance.

-#1711: Fix DenseBase::tail for dynamic template arguments, enhancing flexibility and usability.

-#1712: Suppressed ARM array out-of-bounds warnings for reverseInPlace function on fixed-size matrices.

-#1704: Introduced a free-function swap for dense and sparse matrices to enhance compatibility with C++ standard algorithms.

-#1715: Introduced exp2(x) as a packet op and array method, enhancing precision with reduced error rates.

-#1716: Fix stack allocation assert to improve performance by relocating static assert and reducing performance overhead during evaluator instantiation.

-#1714: Add nextafter for bfloat16 to enhance precision and correctness in calculations.

-#1719: Add tests for sizeof() with one dynamic dimension to enhance coverage.

-#1722: Modified handling of matrix parameters to improve internal data alignment by avoiding passing by value.

-#1720: Fix NVCC builds for CUDA 10+, enhancing compatibility and reducing warnings.

-#1718: Fix OOB access in triangular matrix multiplication to enhance robustness.

-#1723: Fix clang6 compiler issues with optimization flags.

-#1725: Enhanced ARM compatibility by fixing clang6 failures and removing SSE reliance.

-#1724: Fix macro redefinition warning in FFTW test by removing default FFT macros from CMake test declarations.

-#1721: Ensure compatibility of EIGEN_ALIGNED_ALLOCA with nvc++ by replacing __builtin_alloca_with_align.

-#1726: Fixes GPU build issues by initializing constexpr global variables for CUDA compatibility.

-#1727: Make fixed-size objects trivially move assignable for improved performance.

-#1729: Add nvc++ compiler support in Eigen v3.4 for better compatibility.

-#1731: Use EIGEN_CPLUSPLUS instead of __cplusplus for better MSVC compatibility.

-#1736: Add missing EIGEN_DEVICE_FUNCTION decorations enhancing compatibility and performance.

-#1737: Ensure fixed-size matrices conform to std::is_standard_layout, enhancing type safety and reducing compiler warnings.

-#1739: Use numeric limits for overflow checks instead of C99 macro, enhancing type safety and compatibility.

-#1740: Use old syntax for CMake's separate_arguments() to restore compatibility with old CMake versions.

-#1735: Make element accessors constexpr to enhance compile-time usability.

-#1741: Ensure destructors needed by lldb are non-inlined for proper debugging.

-#1742: Cast enum to int in Assign_MKL.h to resolve C++20 compatibility issues.

-#1743: Vectorize erf(x) for double precision with significant speed improvements using SIMD instructions.

-#1745: Fix C++20 constexpr test compilation failures, enhancing test suite compatibility.

-#1747: Optimized erf(x) by removing redundant computations for large arguments.

-#1748: Removed unnecessary check for HasBlend trait to enhance code efficiency and readability.

-#1750: Speed up exp(x) function by 30-35%, leveraging input characteristics for performance gains.

-#1751: Reverted a commit to restore stability in debug mode builds.

-#1752: Prevent premature overflow in exp(x) and improve performance by 3-4%.

-#1754: Simplify and speed up pow() by 5-6%.

-#1755: Optimize setConstant and setZero for better performance in Eigen library.

-#1756: Improve pow(x,y) with 25% speedup and increased accuracy for integer exponents.

-#1758: Add test for using pcast on scalars to enhance testing coverage.

-#1759: Refactor special case handling in pow(x,y) and revert to repeated squaring for <float,int> to enhance accuracy and efficiency.

-#1760: Fix UB in setZero to prevent undefined behavior with null destination arrays.

-#1761: Improved map fill logic to enhance flexibility and memory access patterns.

-#1762: Fixes IOFormat alignment by correcting the rowSpacer computation.

-#1764: Updated CI configuration to use ubuntu:latest, improving build reliability.

-#1763: Documentation improvements for move constructors and move assignments.

-#1765: Introduce deploy phase to CI for tagging successful nightly builds.

-#1769: Fix special packetmath erfc flushing for ARM32 to handle subnormals.

-#1771: Update deploy job to enhance efficiency and streamline steps.

-#1772: Updated git clone strategy to improve branch update reliability.

-#1775: Remove branch name from nightly tag job to simplify the tagging process.

-#1774: Implemented equality comparison operator for matrices with different sizes.

-#1776: Use alpine for deploying nightly tag, improving deployment efficiency.

-#1779: Enable fill_n and memset optimizations for construction and assignment.

-#1785: Add missing #include <new> to resolve build issue.

-#1786: Use omp_get_max_threads if setNbThreads is not set to improve threading behavior.

-#1790: Fix read of uninitialized threshold in SparseQR, enhancing code clarity and safety.

-#1792: Fixed std::fill_n reference to resolve namespace conflicts and improve code reliability.

-#1793: Zero-initialize test arrays to avoid uninitialized reads, enhancing test reliability and memory safety.

-#1794: Clarified documentation for complex number cross product.

-#1795: Eigen::aligned_allocator modified to not inherit from std::allocator, preventing incorrect method calls.

-#1791: Introduces ForkJoin-based ParallelFor algorithm to enhance ThreadPool with improved parallel execution.

-#1797: Improved compatibility and performance for loongarch architecture.

-#1799: Fix typo in NonBlockingThreadPool to improve task management functionality.

-#1796: Updated documentation to clarify non-square dimensions for block objects.

-#1802: Fixed initialization order and removed unused variables in NonBlockingThreadPool.h.

-#1803: Enhanced compatibility of threadpool with C++14 and fixed minor warnings.

-#1804: Fix potential data race on spin_count_ NonBlockingThreadPool member variable to enhance thread safety.

-#1801: Enhanced Simplicial Cholesky analyzePattern with advanced algorithms for improved performance.

-#1806: Fix UTF-8 encoding errors impacting compilation on MSVC and Apple Clang.

-#1810: Changed midpoint selection in Eigen::ForkJoinScheduler to enhance reliability and prevent out-of-bounds errors.

-#1805: Introduced matrixL() and matrixU() functions for accessing L and U Factors in IncompleteLU decomposition.

-#1811: Enhanced configuration for loongarch64 emulated tests in Eigen to improve flexibility and reliability.

-#1807: Fix all the doxygen warnings, enhancing documentation clarity and accuracy.

-#1813: Increased max alignment to 256 bytes for improved performance on modern ARM architectures.

-#1812: Build and deploy nightly Doxygen docs for enhanced accessibility and up-to-date resources.

-#1814: Add missing return statements for PPC to enhance code reliability and correctness.

-#1809: Fix issues in tensor documentation by correcting class name references.

-#1815: Update check for std::hardware_destructive_interference_size to improve compatibility on Android.

-#1816: Fix android hardware_destructive_inference_size issue to ensure compatibility with Android NDK versions 25 and lower.

-#1817: Added EIGEN_CI_CTEST_ARGS for custom test timeouts and standardized argument naming.

-#1818: Enhanced documentation generation with nightly builds and improved Doxygen configuration.

-#1823: Added graphviz to doc build to fix broken graphs.

-#1821: Fix numerical issues with BiCGSTAB to enhance performance and robustness.

-#1824: Ensures condition number is zero for non-invertible matrices, enhancing rcond estimate reliability.

-#1826: Added missing MathJax/LaTeX configuration for proper formula rendering.

-#1825: Eliminate type-punning UB in Eigen::half with a safer bit-cast approach.

-#1827: Remove assumption of std::complex for complex scalar types, enhancing flexibility for user-defined complex types.

-#1828: Enhances TensorRef with flexible type assignments and consistent immutability.

-#1829: Refactored AssignEvaluator.h for enhanced readability and maintainability.

-#1831: Enhanced compatibility for Power builds without VSX and POWER8.

-#1820: Fix Warray-bounds warning for fixed-size assignments by optimizing vectorized traversal strategies.

-#1833: Fixes Warray-bounds in inner product to enhance stability and reliability.

-#1830: Make assignment operations constexpr for compile-time evaluation.

-#1834: Initialize matrix elements in bicgstab test to enhance reliability.

-#1836: Fix implicit copy-constructor warning in TensorRef.

-#1835: Fix bitwise operation error for C++26 compatibility.

-#1837: Implemented a system to retain nightly documentation, ensuring continuous availability despite pipeline failures.

-#1838: Simplified ForkJoin code and ensured test execution, enhancing ParallelFor API and performance.

-#1839: Specify constructor template arguments for ConstexprTest struct to suppress warnings.

-#1840: Fix boolean scatter and random generation for tensors, enhancing reliability and expanding test coverage.

-#1841: Fix docs job for nightlies to ensure consistency and reliability.

-#1842: Fix CMake BOOST warning by updating configuration to resolve deprecated behavior.

-#1843: Fixes STL feature detection to support C++20, enhancing compatibility with various compilers and STL versions.

-#1844: Optimize division operations in TensorVolumePatch.h to reduce computational overhead.

-#1846: Refactor AssignmentFunctors.h to reduce redundancy and unify assignment operations.

-#1847: Fixes potential compilation errors by removing an unnecessary semicolon in DeviceWrapper.

-#1848: Improved TensorDeviceThreadPool.h by removing unused methods and enhancing functionality.

-#1849: Formatted TensorDeviceThreadPool.h and used C++20's if constexpr for enhanced readability and performance.

-#1850: Fix x86 complex vectorized FMA bugs, improving performance and accuracy.

-#1778: Added an install-doc target in CMake to improve documentation installation.

-#1851: Implemented a fix for the Givens rotation algorithm, enhancing accuracy and reliability.

Backend Improvements

-#609: Optimize predux operations on AArch64 architecture for performance enhancement.

-#489: AVX512 and AVX2 support for Packet16i and Packet8i added, enhancing vectorization capabilities.

-#630: Fixed AVX integer packet issues by adding AVX2 protection and correcting AVX512 implementation.

-#618: Added EIGEN_DEVICE_FUNC labels to resolve CUDA 9 gpu_basic compilation issues.

-#639: Fixes in AVX2 PacketMath.h improve performance and stability by correcting typos and addressing unaligned load issues.

-#623: Introduces device-compatible tuple for GPU testing, addressing compatibility issues with std::tuple in Eigen.

-#659: Fix alias violation in BFloat16 enhancing reliability on PPC platforms.

-#663: Disable more CUDA warnings to reduce compilation output clutter.

-#668: Fix Windows CMake compiler/OS detection for improved build system reliability.

-#673: Vectorized Visitor.h with AVX2, enhancing coeffMax performance and matrix decompositions.

-#679: Disabled Tree reduction for GPU to eliminate memory errors and improve stability.

-#677: Use reinterpret_cast on GPU for bit_cast to improve performance by avoiding memcpy overhead.

-#534: Preliminary support for HIP bfloat16 GPU on AMD, setting the foundation for future optimizations.

-#680: Improved PowerPC packing performance and accuracy in non-vectorized operations.

-#704: Removed problematic implementation causing g++-11 crashes.

-#716: Converted diag pragmas to nv_diag for improved code consistency and maintenance.

-#734: Select AVX2 even if the data size is not a multiple of 8 to improve vectorization.

-#745: Fix for HIP compilation breakage in selfAdjoint and triangular view classes.

-#774: Introduced fixes to enable HIP unit tests and updated CMake configuration for compatibility.

-#773: Small speed-up in row-major sparse dense product by optimizing sparse_time_dense_product_impl for better parallelism.

-#789: Include immintrin.h for F16C intrinsics when vectorization is disabled.

-#764: Add MMA and performance improvements for VSX in GEMV for PowerPC.

-#816: Port EIGEN_OPTIMIZATION_BARRIER to support soft float ARM architectures.

-#820: Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.

-#824: Removed inline assembly for FMA (AVX) and added packet ops: pmsub, pnmadd, pnmsub.

-#828: Fix number of block columns to prevent cache overflow on PowerPC in GEMV.

-#832: Fix AVX512 math function consistency and enable for ICC.

-#847: Cleaned up compiler warnings in GEMM & GEMV for PowerPC.

-#858: Fix and enhance sqrt/rsqrt functions for NEON with improved testing and accuracy.

-#869: Fix CMake for SYCL support, enhancing configuration and compatibility.

-#872: Enhanced sqrt/rsqrt for denormal handling improving performance on AVX512.

-#834: AVX512 optimizations for triangular solve improve performance for fp32/fp64 operations without matrix packing.

-#922: Work around MSVC compiler bug dropping const in transpose() and diagonal().

-#929: Split general_matrix_vector_product interface for Power into ColMajor and RowMajor macros to resolve TensorFlow compilation issues.

-#936: Performance improvements in GEMM for Power architecture enhancing efficiency and speed.

-#948: Fix compatibility issues with MSVC+CUDA, enhancing compilation and reducing warnings.

-#959: Restrict AVX512 trsm to AVX512VL and rename files for consistency.

-#960: Removed AVX512VL dependency in trsm to enhance compatibility and maintain performance.

-#860: Added AVX512 optimizations for matrix multiply enhancing large problem size performance.

-#983: Enhanced SYCL backend by extending QueueInterface for better integration with existing SYCL queues.

-#972: Add AVX512 s/dgemm optimizations for compute kernel to improve stability and performance.

-#988: Fix build issues with MSVC for AVX512 by disabling certain optimizations.

-#992: Improved AVX512 TRSM Kernels to respect EIGEN_NO_MALLOC configuration.

-#998: Fix tanh and erf to use vectorized version for EIGEN_FAST_MATH in VSX.

-#997: AVX512 TRSM kernels use alloca for memory allocation when EIGEN_NO_MALLOC is requested, enhancing performance under memory constraints.

-#1000: Enhanced GEMV performance for Power10 by optimizing load/store vector pairs.

-#1011: Improved pblend AVX implementation by optimizing blendv operations for better performance.

-#1024: Added Partial Packet support for GEMM real-only on PowerPC, fixed compilation warnings and reduced binary size.

-#1036: Replace malloc/free with aligned memory management in sparse classes for improved consistency.

-#1040: Specialize psign for AVX2 and avoid vectorizing psign for better performance.

-#1055: Call check_that_malloc_is_allowed() in aligned_realloc() to enhance memory management robustness.

-#1058: Add missing comparison operators for GPU packets to resolve build issues with CUDA.

-#1065: Fix for sparse matrix related breakage on ROCm.

-#1073: Add AVX int32_t pdiv for enhanced performance in integer division.

-#1076: Add vectorized integer division for int32 with AVX512, AVX or SSE, enhancing performance.

-#1018: Optimized gebp_kernel for arm64-neon using 3px8/2px8/1px8, improving performance through better register usage.

-#1086: Conditional vectorization of atan for Altivec only if VSX is available.

-#1075: Optimized sign function for complex numbers by using generic implementation only when vectorizable.

-#1111: Fixed Neon vectorization issues to enhance ARM performance and compatibility.

-#1115: Fixed a bug in the AVX2 implementation of psignbit, improving reliability and correctness.

-#1008: Add support for Power10 (AltiVec) MMA instructions for bfloat16 to enhance performance.

-#1104: Fix NEON instruction bug for half data type in 'fmla' function.

-#1131: Increased L2 and L3 cache sizes for Power10 to boost performance.

-#1129: Add BDCSVD_LAPACKE binding for improved SVD computations using LAPACKE.

-#1141: Enable NEON pabs for unsigned int types to enhance performance of absolute value operations.

-#1142: Fix incorrect NEON native fp16 multiplication.

-#1146: Enable NEON pcmp, plset, and complex psqrt for enhanced NEON support and performance.

-#1153: Fix guard macros for emulated FP16 operators on GPU to improve compatibility with CUDA.

-#1154: Improve performance for Power10 MMA bfloat16 GEMM with significant speed enhancements.

-#1150: Altivec fixes for Darwin to avoid unsupported VSX instructions on older PowerPC CPUs.

-#1126: Enabled Intel DPCPP Compiler support for Eigen's SYCL backend to enhance compatibility with SYCL-2020 features.

-#1174: Improve performance of bfloat16 MMA when dimensions are not multiples of 8 or 4.

-#1184: Fix bugs in pcmp_lt and pnegate, reactivate psqrt for pre-POWER8_VECTOR.

-#1202: Fix MSVC ARM build by resolving macro complications and improving vector type handling.

-#1207: Optimize psign for better performance in floating point operations.

-#1210: Optimized bfloat16 MMA GEMM for improved performance with an additional MMA accumulator.

-#1214: Optimize BF16 to F32 array conversions on Power architectures by reducing vector instructions.

-#1224: Add and enable Packet int divide for Power10 to enhance performance.

-#1227: Fixed null placeholder accessor issue in Reduction SYCL test for DPC++ compliance.

-#1232: Guard use of long double on GPU device to reduce warnings and prevent duplicate symbols.

-#1235: Fix ODR issues with Intel's AVX512 TRSM kernels to enhance linkage and performance.

-#1237: Fix gpu conv3d out-of-resources failure by enhancing internal variable handling.

-#1236: Added partial linear access for LHS & Output, achieving 30% faster bfloat16 GEMM MMA (Power).

-#1253: Streamline packetmath specializations for various backends using a macro to enhance readability and maintainability.

-#1249: Fix failing MSVC tests by replacing *_set1_* intrinsics to ensure consistency.

-#1255: Added MMA to BF16 GEMV for Power, achieving 5.0-6.3X speedup.

-#1258: Revert changes causing register spillage in BF16 GEMM for LLVM (Power), improving performance.

-#1270: Fix issues affecting ARM builds including missing cast, conversion issue for MSVC packets, and macro definitions for 32-bit ARM.

-#1272: Optimize casting operations for x86_64 architecture, enhancing performance, especially for bool casting.

-#1274: Optimize float->bool cast for AVX2, resulting in significant performance improvements.

-#1275: Added vectorized integer casts for x86 and removed redundant tests for performance enhancement.

-#1277: Fix incorrect casting in AVX512DQ path to enhance code reliability and performance.

-#1282: ASAN fixes for AVX512 GEMM/TRSM to address memory-related issues and enhance safety.

-#1293: Enable new AVX512 GEMM kernel by default, incorporating ASAN fixes.

-#1296: Add dynamic dispatch to BF16 GEMM (Power) and new VSX version for significant performance boost.

-#1297: Add Packet4ui, Packet8ui, and Packet4ul to the SSE/AVX PacketMath.h headers for improved SIMD operations with unsigned integers.

-#1307: New VSX version of BF16 GEMV for Power architecture, achieving up to 6.7X performance improvement.

-#1313: Added pmul and abs2 operations for Packet4ul type under AVX2, enhancing computational efficiency and compatibility.

-#1317: Unroll F32 to BF16 loop for 1.8X faster conversions on LLVM and improved GCC handling.

-#1320: Use std::shared_ptr for FFTW/IMKL FFT plan implementation to enhance memory management.

-#1327: Fixes CUDA compilation issues by rearranging header inclusions and adding necessary includes.

-#1341: Replaced CudaStreamDevice with GpuStreamDevice in tensor benchmarks for improved accuracy and reliability.

-#1349: Fixed AVX pstore implementation for correct aligned store with integer types.

-#1356: Unconditionally define EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC for ARM to enhance compilation stability.

-#1357: Fix supportsMMA to obey EIGEN_ALTIVEC_MMA_DYNAMIC_DISPATCH compilation flag and compiler support.

-#1359: Fix AVX512 nomalloc issues in trsm by disabling inappropriate memory allocations.

-#1365: Added missing pcasts for x86 architectures to enhance type conversion capabilities and cleaned up code.

-#1375: Added architecture definition files for Qualcomm Hexagon Vector Extension (HVX) to enhance compatibility and performance.

-#1386: Improved ARM32 float division accuracy and reliability by refining methods and adjusting tests.

-#1392: Fix call to static functions from device by adding EIGEN_DEVICE_FUNC attribute to run methods.

-#1393: Update to use ROCM_PATH instead of HIP_PATH for ROCm 6.0 compatibility.

-#1455: Introduced MI300 related test support for ROCm platforms.

-#1468: Fixes ARM32 issues by enhancing floating-point computations and accuracy.

-#1505: Disable float16 packet casting when native AVX512 f16 is available to enhance stability and correctness.

-#1495: Optimized JacobiSVD by removing unnecessary member variables for better performance and memory efficiency.

-#1526: Fix MSVC GPU build by resolving allocate() function conflict for better MSVC and NVCC compatibility.

-#1544: Introduced Packet2l for efficient int64_t operations with SSE, enhancing integer computation capabilities.

-#1546: Add support for casting between double and int64_t for SSE and AVX2.

-#1567: Improved 32-bit support by addressing double to int64 conversion issues and adding smoketests for Windows.

-#1569: Optimized SparseMatrix move operations for improved performance.

-#1572: Implemented AVX2 vectorized casting from double to int64_t, with performance and code clean-ups.

-#1574: Guard Packet4l definition in AVX to avoid conflicts and enhance stability.

-#1580: Add support for Packet8l to AVX512, optimizing performance and compatibility.

-#1585: Handle missing AVX512 intrinsic, improving stability and reliability with GCC.

-#1588: Fix build for pblend and psin_double, pcos_double when AVX but not AVX2 is supported.

-#1592: Enhanced psincos for PPC and fixed ARM32 test failures.

-#1595: Enhance CI scripts with Windows fixes and performance testing additions.

-#1641: Implemented AVX512F-based casting from double to int64_t for performance enhancement.

-#1639: Fix AVX512FP16 build failure by implementing vectorized cast specializations.

-#1655: Optimize ThreadPool spinning for enhanced performance and reduced latency.

-#1662: Enhanced performance of complex matrix multiplication with dynamic block panel size adjustment, improving speed by 8-33%.

-#1663: Optimized complex multiplication using vfmaddsub for SSE/AVX, enhancing performance.

-#1669: Introduces ARM NEON complex intrinsics for enhanced performance in complex number computations.

-#1673: Improved SVE intrinsic performance by using "_x" suffix instead of "_z", reducing instruction overhead.

-#1683: Introduces SSE and AVX implementations for complex FMA to enhance performance and accuracy.

-#1689: Fix a bug for pcmp_lt_or_nan and add sqrt support for ARM SVE.

-#1733: Add missing AVX predux_any functions for enhanced vectorized performance.

-#1734: Enhanced predux_any function using AVX for performance improvements.

-#1732: Vectorize erfc(x) for double and improve accuracy and performance for float.

-#1749: Disable fill_n optimization for MSVC to improve performance.

-#1753: Re-enable vectorized erf(x) for SSE and AVX for optimized performance.

-#1767: Update ROCm Docker image to Ubuntu 22.04 for improved stability.

-#1768: Transition to Ubuntu 24.04 in ROCm Docker for improved stability.

-#1770: Experiment with Alpine for slimmer Docker builds in CI.

-#1773: CI pipeline now uses commit tags for improved traceability and reliability.

-#1787: Fix the missing CUDA device qualifier to enhance CUDA compatibility and performance.

-#1788: Removed unnecessary ToolChain PPA from CI configuration.

-#1832: Removed the fno-check-new flag for Clang to reduce warnings.

Other Changes

-#608: Removed c++11-off CI jobs to streamline process and focus on modern standards.

-#648: Fix typos in copyright dates.

-#662: Reorganize test main file to enhance maintainability and clarity.

-#794: Fixed duplicated header guards in AltiVec and ZVector packages to prevent conflicts.

-#842: Corrected typo in COD documentation from matrixR() to matrixT().

-#902: Temporarily disable aarch64 CI due to unavailability of Windows on Arm machines.

-#910: Reverted changes to PowerPC MMA flags due to premature merge.

-#919: Completed a missing parenthesis in tutorial to enhance code clarity.

-#1054: Fix typo in doc/TutorialSparse.dox to improve documentation clarity.

-#1074: Revert addition of C++14 constexpr support to restore stability and compatibility.

-#1143: Revert changes to type handling in CompressedStorage.h to restore previous functionality.

-#1173: Revert changes to QR tests to restore original functionality and maintain compatibility.

-#1302: Fix typo in SSE packetmath for improved code clarity.

-#1401: Fixed a typo in the comments for improved documentation clarity.

-#1452: Fix minor issues in basic slicing examples documentation.

-#1463: Revert addition of asserts for .chip due to test failures.

-#1642: Reverted a previous change to fix scalar pselect to maintain library integrity.

-#1640: Fix markdown formatting in README.md.

-#1766: Update ROCm docker image in CI to enhance reliability.

-#1800: Clean up and fix the documentation of ForkJoin.h, focusing on typos and formatting.

-#1808: Fixed minor typos in ForkJoin.h to enhance documentation clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment