1*7f296bb3SBarry Smith(ch_blas_lapack)= 2*7f296bb3SBarry Smith 3*7f296bb3SBarry Smith# The Use of BLAS and LAPACK in PETSc and external libraries 4*7f296bb3SBarry Smith 5*7f296bb3SBarry Smith1. BLAS 1 operations (and GPU equivalents) - vector operations such as `VecNorm()`, `VecAXPY()`, and `VecScale()` are used extensively in PETSc. Depending on the 6*7f296bb3SBarry Smith simulation the size of the vectors may be from hundreds of entries to many millions. 7*7f296bb3SBarry Smith2. BLAS 2 operations - dense matrix with vector operations, generally the dense matrices are very small. 8*7f296bb3SBarry Smith3. Eigenvalue and SVD computations, generally for very small matrices 9*7f296bb3SBarry Smith4. External packages such as MUMPS and SuperLU_DIST use BLAS 3 operations (and possibly BLAS 1 and 2). The 10*7f296bb3SBarry Smith dense matrices may be of modest size, going up to thousands of rows and columns. 11*7f296bb3SBarry Smith 12*7f296bb3SBarry SmithFor most PETSc simulations (that is not using certain external packages) using an optimized set of BLAS/LAPACK routines 13*7f296bb3SBarry Smithonly provides a modest improvement in performance. For some external packages using optimized BLAS/LAPACK can make a 14*7f296bb3SBarry Smithdramatic improvement in performance. 15*7f296bb3SBarry Smith 16*7f296bb3SBarry Smith## 32 or 64-bit BLAS/LAPACK integers 17*7f296bb3SBarry Smith 18*7f296bb3SBarry SmithBLAS/LAPACK libraries may use 32 or 64-bit integers. PETSc configure and compile handles this automatically 19*7f296bb3SBarry Smithso long at the arguments to the BLAS/LAPACK routines are set to the type `PetscBLASInt`. The routine `PetscBLASIntCast`(`PetscInt`, `PetscBLASInt` \*) casts 20*7f296bb3SBarry Smitha `PetscInt` to the BLAS/LAPACK size. If the BLAS/LAPACK size is not large enough it generates an error. For the vast majority of 21*7f296bb3SBarry Smithsimulations, even very large ones, 64-bit BLAS/LAPACK integers are not needed, even when 64-bit PETSc integers are used. 22*7f296bb3SBarry Smith 23*7f296bb3SBarry SmithThe configure 24*7f296bb3SBarry Smithoption `--with-64-bit-blas-indices` attempts to locate and use a 64-bit integer version of BLAS/LAPACK library. Except for MKL Cluster PARDISO, 25*7f296bb3SBarry Smithmost external packages do not support using 26*7f296bb3SBarry Smith64-bit BLAS/LAPACK integers so if you are using such packages you cannot use 64-bit BLAS/LAPACK integers. 27*7f296bb3SBarry Smith 28*7f296bb3SBarry SmithThe configure options `--with-64-bit-indices` and `--with-64-bit-blas-indices` are independent. `--with-64-bit-indices` does not imply that the 29*7f296bb3SBarry SmithBLAS/LAPACK libraries use 64 bit indices. 30*7f296bb3SBarry Smith 31*7f296bb3SBarry Smith## Shared memory BLAS/LAPACK parallelism 32*7f296bb3SBarry Smith 33*7f296bb3SBarry SmithSome BLAS/LAPACK libraries can make use of shared memory parallelism within the function calls, generally using OpenMP, or possibly PThreads. 34*7f296bb3SBarry SmithIf this feature is turned on, it is in addition to the MPI based parallelism that PETSc is using. Thus it can result in over-subscription of hardware resources. For example, 35*7f296bb3SBarry Smithif a system has 16 cores and PETSc is run with an MPI size of 16 then each core is assigned an MPI process. But if the BLAS/LAPACK is running with 36*7f296bb3SBarry SmithOpenMP and 4 threads per process this results in 64 threads competing to use 16 cores which will perform poorly. 37*7f296bb3SBarry Smith 38*7f296bb3SBarry SmithIf one elects to use both MPI parallelism and shared memory BLAS/LAPACK parallelism one should ensure they do not over subscribe the hardware 39*7f296bb3SBarry Smithresources. Since PETSc does not natively use OpenMP this means that phases of the computation that do not use BLAS/LAPACK will be under-subscribed, 40*7f296bb3SBarry Smiththus under-utilizing the system. For PETSc simulations which do not use external packages there is generally no benefit to using parallel 41*7f296bb3SBarry SmithBLAS/LAPACK. The environmental variable `OMP_NUM_THREADS` can be used to set the number of threads used by each MPI process for its shared memory parallel BLAS/LAPACK. The additional 42*7f296bb3SBarry Smithenvironmental variables `OMP_PROC_BIND` and `OMP_PLACES` may also need to be set appropriately for the system to obtain good parallel performance with 43*7f296bb3SBarry SmithBLAS/LAPACK. The configure option `--with-openmp` will trigger PETSc to try to locate and use a parallel BLAS/LAPACK library. 44*7f296bb3SBarry Smith 45*7f296bb3SBarry SmithCertain external packages such as MUMPS may benefit from using parallel BLAS/LAPACK operations. See the manual page `MATSOLVERMUMPS` for details on 46*7f296bb3SBarry Smithhow one can restrict the number of MPI processes while running MUMPS to utilize parallel BLAS/LAPACK. 47*7f296bb3SBarry Smith 48*7f296bb3SBarry Smith(ch_blas_lapack_avail_libs)= 49*7f296bb3SBarry Smith 50*7f296bb3SBarry Smith## Available BLAS/LAPACK libraries 51*7f296bb3SBarry Smith 52*7f296bb3SBarry SmithMost systems (besides Microsoft Windows) come with pre-installed BLAS/LAPACK which are satisfactory for many PETSc simulations. 53*7f296bb3SBarry Smith 54*7f296bb3SBarry SmithThe freely available Intel MKL mathematics libraries provide BLAS/LAPACK that are generally better performing than the system provided libraries 55*7f296bb3SBarry Smithand are generally fine for most users. 56*7f296bb3SBarry Smith 57*7f296bb3SBarry SmithFor systems that do not provide BLAS/LAPACK, such as Microsoft Windows, PETSc provides the Fortran reference version 58*7f296bb3SBarry Smith`--download-fblaslapack` and a f2c generated C version `--download-f2cblaslapack` (which also supports 128 bit real number computations). 59*7f296bb3SBarry SmithThese libraries are less optimized but useful to get started with PETSc easily. 60*7f296bb3SBarry Smith 61*7f296bb3SBarry SmithPETSc also provides access to OpenBLAS via the `--download-openblas` configure option. OpenBLAS uses some highly optimized operations but falls back on reference 62*7f296bb3SBarry Smithroutines for many other operations. See the OpenBLAS manual for more information. The configure option `--download-openblas` provides a full BLAS/LAPACK implementation. 63*7f296bb3SBarry Smith 64*7f296bb3SBarry SmithBLIS does not bundle LAPACK with it so PETSc's configure attempts to locate a compatible system LAPACK library to use if `--download-blis` is 65*7f296bb3SBarry Smithselected. One can use `--download-f2cblaslapack --download-blis`. This is recommended as a portable high-performance option. It is possible if you use `--download-blis` without `--download-f2cblaslapack` the BLIS library installed will **not** be used! Instead, PETSc will link in some LAPACK implementation and the BLAS that comes with that implementation! 66