xref: /libCEED/rust/libceed-sys/c-src/backends/magma/tuning/README.md (revision e5a301d7f3b471e9fc19701d6f25d22876a6f6b4)
1*4c387b0fSSebastian Grimberg# MAGMA Backend Autotuning (Non-tensor Basis)
2*4c387b0fSSebastian Grimberg
3*4c387b0fSSebastian GrimbergThe `magma` backend uses specialized GPU kernels for a non-tensor basis with
4*4c387b0fSSebastian Grimberg`P`, `Q` less than a prescribed value, and above this cutoff uses a standard
5*4c387b0fSSebastian Grimberglibrary GEMM implementation. The specialized kernels have a single tunable
6*4c387b0fSSebastian Grimbergblocking factor parameter, `NB`, which varies with `P` and `Q` as well as the
7*4c387b0fSSebastian Grimbergsize of the number of elements `N`. This folder contains the tuning data, in
8*4c387b0fSSebastian Grimbergheader files called `<ARCH>_rtc.h`, where `<ARCH>` is the GPU name, as well as a
9*4c387b0fSSebastian Grimbergsimple C++ program (`tuning.cpp`) and Python driver (`generate_tuning.py`) to
10*4c387b0fSSebastian Grimberggenerate the optimal `NB` selections for a new target architecture.
11*4c387b0fSSebastian Grimberg
12*4c387b0fSSebastian Grimberg## Generating Autotuning Data
13*4c387b0fSSebastian Grimberg
14*4c387b0fSSebastian GrimbergA sample run to generate the tuning data for an A100 GPU, considering values of
15*4c387b0fSSebastian Grimberg`NB` from 1 to 32 and saved to `a100_rtc.h`, is:
16*4c387b0fSSebastian Grimberg
17*4c387b0fSSebastian Grimberg```sh
18*4c387b0fSSebastian Grimbergpython generate_tuning.py -arch a100 -max-nb 32 -build-cmd "make" -ceed "/gpu/cuda/magma"
19*4c387b0fSSebastian Grimberg```
20*4c387b0fSSebastian Grimberg
21*4c387b0fSSebastian GrimbergThe `-build-cmd` parameter specifies the command which should be used to compile
22*4c387b0fSSebastian Grimbergthe libCEED library. For example, this may be a build script which calls `make`
23*4c387b0fSSebastian Grimberginternally with the desired parameters, or might just be `make` if a previous
24*4c387b0fSSebastian Grimbergcall to `make configure` has configured the build. Finally, the `-ceed`
25*4c387b0fSSebastian Grimbergspecifies the backend to use, typically one of `/gpu/cuda/magma` or
26*4c387b0fSSebastian Grimberg`/gpu/hip/magma`.
27*4c387b0fSSebastian Grimberg
28*4c387b0fSSebastian GrimbergAlternatively, the `tuning` program can be built and run on its own to benchmark
29*4c387b0fSSebastian Grimbergthe basis application for a given backend. Run `make tuning` from this directory
30*4c387b0fSSebastian Grimbergand call the program as:
31*4c387b0fSSebastian Grimberg
32*4c387b0fSSebastian Grimberg```sh
33*4c387b0fSSebastian Grimberg./tuning "/gpu/cuda/magma"
34*4c387b0fSSebastian Grimberg````
35*4c387b0fSSebastian Grimberg
36*4c387b0fSSebastian GrimbergNote that in order for the benchmarks to make sense for `magma` backends, the
37*4c387b0fSSebastian Grimberg`ceed_magma_queue_sync` in `ceed-magma.h` should be set to
38*4c387b0fSSebastian Grimberg`cudaDeviceSynchronize()` or `hipDeviceSynchronize()`.
39