xref: /petsc/doc/manual/blas-lapack.md (revision b11d9968bc79904c690b122f9399be46447eb113)
1(ch_blas_lapack)=
2
3# The Use of BLAS and LAPACK in PETSc and external libraries
4
51. BLAS 1 operations (and GPU equivalents) - vector operations such as `VecNorm()`, `VecAXPY()`, and `VecScale()` are used extensively in PETSc. Depending on the
6   simulation the size of the vectors may be from hundreds of entries to many millions.
72. BLAS 2 operations - dense matrix with vector operations, generally the dense matrices are very small.
83. Eigenvalue and SVD computations, generally for very small matrices
94. External packages such as MUMPS and SuperLU_DIST use BLAS 3 operations (and possibly BLAS 1 and 2). The
10   dense matrices may be of modest size, going up to thousands of rows and columns.
11
12For most PETSc simulations (that is not using certain external packages) using an optimized set of BLAS/LAPACK routines
13only provides a modest improvement in performance. For some external packages using optimized BLAS/LAPACK can make a
14dramatic improvement in performance.
15
16## 32 or 64-bit BLAS/LAPACK integers
17
18BLAS/LAPACK libraries may use 32 or 64-bit integers. PETSc configure and compile handles this automatically
19so long at the arguments to the BLAS/LAPACK routines are set to the type `PetscBLASInt`. The routine `PetscBLASIntCast`(`PetscInt`, `PetscBLASInt` \*) casts
20a `PetscInt` to the BLAS/LAPACK size. If the BLAS/LAPACK size is not large enough it generates an error. For the vast majority of
21simulations, even very large ones, 64-bit BLAS/LAPACK integers are not needed, even when 64-bit PETSc integers are used.
22
23The configure
24option `--with-64-bit-blas-indices` attempts to locate and use a 64-bit integer version of BLAS/LAPACK library. Except for MKL Cluster PARDISO,
25most external packages do not support using
2664-bit BLAS/LAPACK integers so if you are using such packages you cannot use 64-bit BLAS/LAPACK integers.
27
28The configure options `--with-64-bit-indices` and `--with-64-bit-blas-indices` are independent. `--with-64-bit-indices` does not imply that the
29BLAS/LAPACK libraries use 64 bit indices.
30
31## Shared memory BLAS/LAPACK parallelism
32
33Some BLAS/LAPACK libraries can make use of shared memory parallelism within the function calls, generally using OpenMP, or possibly PThreads.
34If this feature is turned on, it is in addition to the MPI based parallelism that PETSc is using. Thus it can result in over-subscription of hardware resources. For example,
35if a system has 16 cores and PETSc is run with an MPI size of 16 then each core is assigned an MPI process. But if the BLAS/LAPACK is running with
36OpenMP and 4 threads per process this results in 64 threads competing to use 16 cores which will perform poorly.
37
38If one elects to use both MPI parallelism and shared memory BLAS/LAPACK parallelism one should ensure they do not over subscribe the hardware
39resources. Since PETSc does not natively use OpenMP this means that phases of the computation that do not use BLAS/LAPACK will be under-subscribed,
40thus under-utilizing the system. For PETSc simulations which do not use external packages there is generally no benefit to using parallel
41BLAS/LAPACK. The environmental variable `OMP_NUM_THREADS` can be used to set the number of threads used by each MPI process for its shared memory parallel BLAS/LAPACK. The additional
42environmental variables `OMP_PROC_BIND` and `OMP_PLACES` may also need to be set appropriately for the system to obtain good parallel performance with
43BLAS/LAPACK. The configure option `--with-openmp` will trigger PETSc to try to locate and use a parallel BLAS/LAPACK library.
44
45Certain external packages such as MUMPS may benefit from using parallel BLAS/LAPACK operations. See the manual page `MATSOLVERMUMPS` for details on
46how one can restrict the number of MPI processes while running MUMPS to utilize parallel BLAS/LAPACK.
47
48(ch_blas_lapack_avail_libs)=
49
50## Available BLAS/LAPACK libraries
51
52Most systems (besides Microsoft Windows) come with pre-installed BLAS/LAPACK which are satisfactory for many PETSc simulations.
53
54The freely available Intel MKL mathematics libraries provide BLAS/LAPACK that are generally better performing than the system provided libraries
55and are generally fine for most users.
56
57For systems that do not provide BLAS/LAPACK, such as Microsoft Windows, PETSc provides the Fortran reference version
58`--download-fblaslapack` and a f2c generated C version `--download-f2cblaslapack` (which also supports 128 bit real number computations).
59These libraries are less optimized but useful to get started with PETSc easily.
60
61PETSc also provides access to OpenBLAS via the `--download-openblas` configure option. OpenBLAS uses some highly optimized operations but falls back on reference
62routines for many other operations. See the OpenBLAS manual for more information. The configure option `--download-openblas` provides a full BLAS/LAPACK implementation.
63
64BLIS does not bundle LAPACK with it so PETSc's configure attempts to locate a compatible system LAPACK library to use if `--download-blis` is
65selected. One can use `--download-f2cblaslapack --download-blis`. This is recommended as a portable high-performance option. It is possible if you use `--download-blis` without `--download-f2cblaslapack` the BLIS library installed will **not** be used! Instead, PETSc will link in some LAPACK implementation and the BLAS that comes with that implementation!
66