xref: /petsc/doc/manual/blas-lapack.md (revision b11d9968bc79904c690b122f9399be46447eb113)
1*7f296bb3SBarry Smith(ch_blas_lapack)=
2*7f296bb3SBarry Smith
3*7f296bb3SBarry Smith# The Use of BLAS and LAPACK in PETSc and external libraries
4*7f296bb3SBarry Smith
5*7f296bb3SBarry Smith1. BLAS 1 operations (and GPU equivalents) - vector operations such as `VecNorm()`, `VecAXPY()`, and `VecScale()` are used extensively in PETSc. Depending on the
6*7f296bb3SBarry Smith   simulation the size of the vectors may be from hundreds of entries to many millions.
7*7f296bb3SBarry Smith2. BLAS 2 operations - dense matrix with vector operations, generally the dense matrices are very small.
8*7f296bb3SBarry Smith3. Eigenvalue and SVD computations, generally for very small matrices
9*7f296bb3SBarry Smith4. External packages such as MUMPS and SuperLU_DIST use BLAS 3 operations (and possibly BLAS 1 and 2). The
10*7f296bb3SBarry Smith   dense matrices may be of modest size, going up to thousands of rows and columns.
11*7f296bb3SBarry Smith
12*7f296bb3SBarry SmithFor most PETSc simulations (that is not using certain external packages) using an optimized set of BLAS/LAPACK routines
13*7f296bb3SBarry Smithonly provides a modest improvement in performance. For some external packages using optimized BLAS/LAPACK can make a
14*7f296bb3SBarry Smithdramatic improvement in performance.
15*7f296bb3SBarry Smith
16*7f296bb3SBarry Smith## 32 or 64-bit BLAS/LAPACK integers
17*7f296bb3SBarry Smith
18*7f296bb3SBarry SmithBLAS/LAPACK libraries may use 32 or 64-bit integers. PETSc configure and compile handles this automatically
19*7f296bb3SBarry Smithso long at the arguments to the BLAS/LAPACK routines are set to the type `PetscBLASInt`. The routine `PetscBLASIntCast`(`PetscInt`, `PetscBLASInt` \*) casts
20*7f296bb3SBarry Smitha `PetscInt` to the BLAS/LAPACK size. If the BLAS/LAPACK size is not large enough it generates an error. For the vast majority of
21*7f296bb3SBarry Smithsimulations, even very large ones, 64-bit BLAS/LAPACK integers are not needed, even when 64-bit PETSc integers are used.
22*7f296bb3SBarry Smith
23*7f296bb3SBarry SmithThe configure
24*7f296bb3SBarry Smithoption `--with-64-bit-blas-indices` attempts to locate and use a 64-bit integer version of BLAS/LAPACK library. Except for MKL Cluster PARDISO,
25*7f296bb3SBarry Smithmost external packages do not support using
26*7f296bb3SBarry Smith64-bit BLAS/LAPACK integers so if you are using such packages you cannot use 64-bit BLAS/LAPACK integers.
27*7f296bb3SBarry Smith
28*7f296bb3SBarry SmithThe configure options `--with-64-bit-indices` and `--with-64-bit-blas-indices` are independent. `--with-64-bit-indices` does not imply that the
29*7f296bb3SBarry SmithBLAS/LAPACK libraries use 64 bit indices.
30*7f296bb3SBarry Smith
31*7f296bb3SBarry Smith## Shared memory BLAS/LAPACK parallelism
32*7f296bb3SBarry Smith
33*7f296bb3SBarry SmithSome BLAS/LAPACK libraries can make use of shared memory parallelism within the function calls, generally using OpenMP, or possibly PThreads.
34*7f296bb3SBarry SmithIf this feature is turned on, it is in addition to the MPI based parallelism that PETSc is using. Thus it can result in over-subscription of hardware resources. For example,
35*7f296bb3SBarry Smithif a system has 16 cores and PETSc is run with an MPI size of 16 then each core is assigned an MPI process. But if the BLAS/LAPACK is running with
36*7f296bb3SBarry SmithOpenMP and 4 threads per process this results in 64 threads competing to use 16 cores which will perform poorly.
37*7f296bb3SBarry Smith
38*7f296bb3SBarry SmithIf one elects to use both MPI parallelism and shared memory BLAS/LAPACK parallelism one should ensure they do not over subscribe the hardware
39*7f296bb3SBarry Smithresources. Since PETSc does not natively use OpenMP this means that phases of the computation that do not use BLAS/LAPACK will be under-subscribed,
40*7f296bb3SBarry Smiththus under-utilizing the system. For PETSc simulations which do not use external packages there is generally no benefit to using parallel
41*7f296bb3SBarry SmithBLAS/LAPACK. The environmental variable `OMP_NUM_THREADS` can be used to set the number of threads used by each MPI process for its shared memory parallel BLAS/LAPACK. The additional
42*7f296bb3SBarry Smithenvironmental variables `OMP_PROC_BIND` and `OMP_PLACES` may also need to be set appropriately for the system to obtain good parallel performance with
43*7f296bb3SBarry SmithBLAS/LAPACK. The configure option `--with-openmp` will trigger PETSc to try to locate and use a parallel BLAS/LAPACK library.
44*7f296bb3SBarry Smith
45*7f296bb3SBarry SmithCertain external packages such as MUMPS may benefit from using parallel BLAS/LAPACK operations. See the manual page `MATSOLVERMUMPS` for details on
46*7f296bb3SBarry Smithhow one can restrict the number of MPI processes while running MUMPS to utilize parallel BLAS/LAPACK.
47*7f296bb3SBarry Smith
48*7f296bb3SBarry Smith(ch_blas_lapack_avail_libs)=
49*7f296bb3SBarry Smith
50*7f296bb3SBarry Smith## Available BLAS/LAPACK libraries
51*7f296bb3SBarry Smith
52*7f296bb3SBarry SmithMost systems (besides Microsoft Windows) come with pre-installed BLAS/LAPACK which are satisfactory for many PETSc simulations.
53*7f296bb3SBarry Smith
54*7f296bb3SBarry SmithThe freely available Intel MKL mathematics libraries provide BLAS/LAPACK that are generally better performing than the system provided libraries
55*7f296bb3SBarry Smithand are generally fine for most users.
56*7f296bb3SBarry Smith
57*7f296bb3SBarry SmithFor systems that do not provide BLAS/LAPACK, such as Microsoft Windows, PETSc provides the Fortran reference version
58*7f296bb3SBarry Smith`--download-fblaslapack` and a f2c generated C version `--download-f2cblaslapack` (which also supports 128 bit real number computations).
59*7f296bb3SBarry SmithThese libraries are less optimized but useful to get started with PETSc easily.
60*7f296bb3SBarry Smith
61*7f296bb3SBarry SmithPETSc also provides access to OpenBLAS via the `--download-openblas` configure option. OpenBLAS uses some highly optimized operations but falls back on reference
62*7f296bb3SBarry Smithroutines for many other operations. See the OpenBLAS manual for more information. The configure option `--download-openblas` provides a full BLAS/LAPACK implementation.
63*7f296bb3SBarry Smith
64*7f296bb3SBarry SmithBLIS does not bundle LAPACK with it so PETSc's configure attempts to locate a compatible system LAPACK library to use if `--download-blis` is
65*7f296bb3SBarry Smithselected. One can use `--download-f2cblaslapack --download-blis`. This is recommended as a portable high-performance option. It is possible if you use `--download-blis` without `--download-f2cblaslapack` the BLIS library installed will **not** be used! Instead, PETSc will link in some LAPACK implementation and the BLAS that comes with that implementation!
66