xref: /libCEED/README.md (revision a85b61d6baed23034bd0a67d7a68fad20fae7667)
1# libCEED: Efficient Extensible Discretization
2
3[![GitHub Actions][github-badge]][github-link]
4[![GitLab-CI][gitlab-badge]][gitlab-link]
5[![Code coverage][codecov-badge]][codecov-link]
6[![BSD-2-Clause][license-badge]][license-link]
7[![Documentation][doc-badge]][doc-link]
8[![User manual][zenodo-badge]][zenodo-link]
9[![JOSS paper][joss-badge]][joss-link]
10[![Binder][binder-badge]][binder-link]
11
12## Summary and Purpose
13
14libCEED provides fast algebra for element-based discretizations, designed for performance portability, run-time flexibility, and clean embedding in higher level libraries and applications.
15It offers a C99 interface as well as bindings for Fortran, Python, Julia, and Rust.
16While our focus is on high-order finite elements, the approach is mostly algebraic and thus applicable to other discretizations in factored form, as explained in the [user manual](https://libceed.org/en/latest/) and API implementation portion of the [documentation](https://libceed.org/en/latest/api/).
17
18One of the challenges with high-order methods is that a global sparse matrix is no longer a good representation of a high-order linear operator, both with respect to the FLOPs needed for its evaluation, as well as the memory transfer needed for a matvec.
19Thus, high-order methods require a new "format" that still represents a linear (or more generally non-linear) operator, but not through a sparse matrix.
20
21The goal of libCEED is to propose such a format, as well as supporting implementations and data structures, that enable efficient operator evaluation on a variety of computational device types (CPUs, GPUs, etc.).
22This new operator description is based on algebraically [factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition), which is easy to incorporate in a wide variety of applications, without significant refactoring of their own discretization infrastructure.
23
24The repository is part of the [CEED software suite](http://ceed.exascaleproject.org/software/), a collection of software benchmarks, miniapps, libraries and APIs for efficient exascale discretizations based on high-order finite element and spectral element methods.
25See <http://github.com/ceed> for more information and source code availability.
26
27The CEED research is supported by the [Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a [capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative.
28
29For more details on the CEED API see the [user manual](https://libceed.org/en/latest/).
30
31% gettingstarted-inclusion-marker
32
33## Building
34
35The CEED library, `libceed`, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces.
36It can be built using:
37
38```console
39$ make
40```
41
42or, with optimization flags:
43
44```console
45$ make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
46```
47
48These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below).
49
50The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host.
51Support may need to be manually specified via:
52
53```console
54$ make AVX=1
55```
56
57or:
58
59```console
60$ make AVX=0
61```
62
63if your compiler does not support gcc-style options, if you are cross compiling, etc.
64
65To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory to your `make` invocation.
66To enable HIP support, add `HIP_DIR=/opt/rocm` or an appropriate directory.
67To store these or other arguments as defaults for future invocations of `make`, use:
68
69```console
70$ make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2'
71```
72
73which stores these variables in `config.mk`.
74
75### WebAssembly
76
77libCEED can be built for WASM using [Emscripten](https://emscripten.org). For example, one can build the library and run a standalone WASM executable using
78
79``` console
80$ emmake make build/ex2-surface.wasm
81$ wasmer build/ex2-surface.wasm -- -s 200000
82```
83
84## Additional Language Interfaces
85
86The Fortran interface is built alongside the library automatically.
87
88Python users can install using:
89
90```console
91$ pip install libceed
92```
93
94or in a clone of the repository via `pip install .`.
95
96Julia users can install using:
97
98```console
99$ julia
100julia> ]
101pkg> add LibCEED
102```
103
104See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/) for more information.
105
106Rust users can include libCEED via `Cargo.toml`:
107
108```toml
109[dependencies]
110libceed = "0.11.0"
111```
112
113See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
114
115## Testing
116
117The test suite produces [TAP](https://testanything.org) output and is run by:
118
119```console
120$ make test
121```
122
123or, using the `prove` tool distributed with Perl (recommended):
124
125```console
126$ make prove
127```
128
129## Backends
130
131There are multiple supported backends, which can be selected at runtime in the examples:
132
133| CEED resource              | Backend                                           | Deterministic Capable |
134| :---                       | :---                                              | :---:                 |
135||
136| **CPU Native**             |
137| `/cpu/self/ref/serial`     | Serial reference implementation                   | Yes                   |
138| `/cpu/self/ref/blocked`    | Blocked reference implementation                  | Yes                   |
139| `/cpu/self/opt/serial`     | Serial optimized C implementation                 | Yes                   |
140| `/cpu/self/opt/blocked`    | Blocked optimized C implementation                | Yes                   |
141| `/cpu/self/avx/serial`     | Serial AVX implementation                         | Yes                   |
142| `/cpu/self/avx/blocked`    | Blocked AVX implementation                        | Yes                   |
143||
144| **CPU Valgrind**           |
145| `/cpu/self/memcheck/*`     | Memcheck backends, undefined value checks         | Yes                   |
146||
147| **CPU LIBXSMM**            |
148| `/cpu/self/xsmm/serial`    | Serial LIBXSMM implementation                     | Yes                   |
149| `/cpu/self/xsmm/blocked`   | Blocked LIBXSMM implementation                    | Yes                   |
150||
151| **CUDA Native**            |
152| `/gpu/cuda/ref`            | Reference pure CUDA kernels                       | Yes                   |
153| `/gpu/cuda/shared`         | Optimized pure CUDA kernels using shared memory   | Yes                   |
154| `/gpu/cuda/gen`            | Optimized pure CUDA kernels using code generation | No                    |
155||
156| **HIP Native**             |
157| `/gpu/hip/ref`             | Reference pure HIP kernels                        | Yes                   |
158| `/gpu/hip/shared`          | Optimized pure HIP kernels using shared memory    | Yes                   |
159| `/gpu/hip/gen`             | Optimized pure HIP kernels using code generation  | No                    |
160||
161| **MAGMA**                  |
162| `/gpu/cuda/magma`          | CUDA MAGMA kernels                                | No                    |
163| `/gpu/cuda/magma/det`      | CUDA MAGMA kernels                                | Yes                   |
164| `/gpu/hip/magma`           | HIP MAGMA kernels                                 | No                    |
165| `/gpu/hip/magma/det`       | HIP MAGMA kernels                                 | Yes                   |
166||
167| **OCCA**                   |
168| `/*/occa`                  | Selects backend based on available OCCA modes     | Yes                   |
169| `/cpu/self/occa`           | OCCA backend with serial CPU kernels              | Yes                   |
170| `/cpu/openmp/occa`         | OCCA backend with OpenMP kernels                  | Yes                   |
171| `/cpu/dpcpp/occa`          | OCCA backend with CPC++ kernels                   | Yes                   |
172| `/gpu/cuda/occa`           | OCCA backend with CUDA kernels                    | Yes                   |
173| `/gpu/hip/occa`~           | OCCA backend with HIP kernels                     | Yes                   |
174
175The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes with a smaller number of high order elements.
176The `/cpu/self/*/blocked` backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements.
177
178The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
179
180The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
181
182The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
183
184The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool to help verify that user QFunctions have no undefined values.
185To use, run your code with Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`.
186A 'development' or 'debugging' version of Valgrind with headers is required to use this backend.
187This backend can be run in serial or blocked mode and defaults to running in the serial mode if `/cpu/self/memcheck` is selected at runtime.
188
189The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU performance.
190If linking MKL and LIBXSMM is desired but the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be forced by setting the environment variable `MKL=1`.
191
192The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
193
194The `/gpu/hip/*` backends provide GPU performance strictly using HIP.
195They are based on the `/gpu/cuda/*` backends.
196ROCm version 4.2 or newer is required.
197
198The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
199To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
200By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent directory, or set `MAGMA_DIR` to the proper location.
201MAGMA version 2.5.0 or newer is required.
202Currently, each MAGMA library installation is only built for either CUDA or HIP.
203The corresponding set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built for the version of the MAGMA library found in `MAGMA_DIR`.
204
205Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` after the resource name.
206For example:
207
208> - `/gpu/cuda/gen:device_id=1`
209
210The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide cross platform performance.
211To enable the OCCA backend, the environment variable `OCCA_DIR` must point to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, `OCCA_DIR` is set to `../occa`).
212OCCA version 1.4.0 or newer is required.
213
214Users can pass specific OCCA device properties after setting the CEED resource.
215For example:
216
217> - `"/*/occa:mode='CUDA',device_id=0"`
218
219Bit-for-bit reproducibility is important in some applications.
220However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
221The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
222
223## Examples
224
225libCEED comes with several examples of its usage, ranging from standalone C codes in the `/examples/ceed` directory to examples based on external packages, such as MFEM, PETSc, and Nek5000.
226Nek5000 v18.0 or greater is required.
227
228To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and `NEK5K_DIR` variables and run:
229
230```console
231$ cd examples/
232```
233
234% running-examples-inclusion-marker
235
236```console
237# libCEED examples on CPU and GPU
238$ cd ceed/
239$ make
240$ ./ex1-volume -ceed /cpu/self
241$ ./ex1-volume -ceed /gpu/cuda
242$ ./ex2-surface -ceed /cpu/self
243$ ./ex2-surface -ceed /gpu/cuda
244$ cd ..
245
246# MFEM+libCEED examples on CPU and GPU
247$ cd mfem/
248$ make
249$ ./bp1 -ceed /cpu/self -no-vis
250$ ./bp3 -ceed /gpu/cuda -no-vis
251$ cd ..
252
253# Nek5000+libCEED examples on CPU and GPU
254$ cd nek/
255$ make
256$ ./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
257$ ./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
258$ cd ..
259
260# PETSc+libCEED examples on CPU and GPU
261$ cd petsc/
262$ make
263$ ./bps -problem bp1 -ceed /cpu/self
264$ ./bps -problem bp2 -ceed /gpu/cuda
265$ ./bps -problem bp3 -ceed /cpu/self
266$ ./bps -problem bp4 -ceed /gpu/cuda
267$ ./bps -problem bp5 -ceed /cpu/self
268$ ./bps -problem bp6 -ceed /gpu/cuda
269$ cd ..
270
271$ cd petsc/
272$ make
273$ ./bpsraw -problem bp1 -ceed /cpu/self
274$ ./bpsraw -problem bp2 -ceed /gpu/cuda
275$ ./bpsraw -problem bp3 -ceed /cpu/self
276$ ./bpsraw -problem bp4 -ceed /gpu/cuda
277$ ./bpsraw -problem bp5 -ceed /cpu/self
278$ ./bpsraw -problem bp6 -ceed /gpu/cuda
279$ cd ..
280
281$ cd petsc/
282$ make
283$ ./bpssphere -problem bp1 -ceed /cpu/self
284$ ./bpssphere -problem bp2 -ceed /gpu/cuda
285$ ./bpssphere -problem bp3 -ceed /cpu/self
286$ ./bpssphere -problem bp4 -ceed /gpu/cuda
287$ ./bpssphere -problem bp5 -ceed /cpu/self
288$ ./bpssphere -problem bp6 -ceed /gpu/cuda
289$ cd ..
290
291$ cd petsc/
292$ make
293$ ./area -problem cube -ceed /cpu/self -degree 3
294$ ./area -problem cube -ceed /gpu/cuda -degree 3
295$ ./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
296$ ./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
297
298$ cd fluids/
299$ make
300$ ./navierstokes -ceed /cpu/self -degree 1
301$ ./navierstokes -ceed /gpu/cuda -degree 1
302$ cd ..
303
304$ cd solids/
305$ make
306$ ./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
307$ ./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
308$ cd ..
309```
310
311For the last example shown, sample meshes to be used in place of `[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
312
313The above code assumes a GPU-capable machine with the CUDA backends enabled.
314Depending on the available backends, other CEED resource specifiers can be provided with the `-ceed` option.
315Other command line arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
316
317% benchmarks-marker
318
319## Benchmarks
320
321A sequence of benchmarks for all enabled backends can be run using:
322
323```console
324$ make benchmarks
325```
326
327The results from the benchmarks are stored inside the `benchmarks/` directory and they can be viewed using the commands (requires python with matplotlib):
328
329```console
330$ cd benchmarks
331$ python postprocess-plot.py petsc-bps-bp1-*-output.txt
332$ python postprocess-plot.py petsc-bps-bp3-*-output.txt
333```
334
335Using the `benchmarks` target runs a comprehensive set of benchmarks which may take some time to run.
336Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
337
338For more details about the benchmarks, see the `benchmarks/README.md` file.
339
340## Install
341
342To install libCEED, run:
343
344```console
345$ make install prefix=/path/to/install/dir
346```
347
348or (e.g., if creating packages):
349
350```console
351$ make install prefix=/usr DESTDIR=/packaging/path
352```
353
354To build and install in separate steps, run:
355
356```console
357$ make for_install=1 prefix=/path/to/install/dir
358$ make install prefix=/path/to/install/dir
359```
360
361The usual variables like `CC` and `CFLAGS` are used, and optimization flags for all languages can be set using the likes of `OPT='-O3 -march=native'`.
362Use `STATIC=1` to build static libraries (`libceed.a`).
363
364To install libCEED for Python, run:
365
366```console
367$ pip install libceed
368```
369
370with the desired setuptools options, such as `--user`.
371
372### pkg-config
373
374In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) file that can be used to easily compile and link.
375[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if `$prefix` is a standard location or you set the environment variable `PKG_CONFIG_PATH`:
376
377```console
378$ cc `pkg-config --cflags --libs ceed` -o myapp myapp.c
379```
380
381will build `myapp` with libCEED.
382This can be used with the source or installed directories.
383Most build systems have support for pkg-config.
384
385## Contact
386
387You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
388
389## How to Cite
390
391If you utilize libCEED please cite:
392
393```bibtex
394@article{libceed-joss-paper,
395  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
396  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
397  journal      = {Journal of Open Source Software},
398  year         = {2021},
399  publisher    = {The Open Journal},
400  volume       = {6},
401  number       = {63},
402  pages        = {2945},
403  doi          = {10.21105/joss.02945}
404}
405
406@misc{libceed-user-manual,
407  author       = {Abdelfattah, Ahmad and
408                  Barra, Valeria and
409                  Beams, Natalie and
410                  Brown, Jed and
411                  Camier, Jean-Sylvain and
412                  Dobrev, Veselin and
413                  Dudouit, Yohann and
414                  Ghaffari, Leila and
415                  Kolev, Tzanio and
416                  Medina, David and
417                  Pazner, Will and
418                  Ratnayaka, Thilina and
419                  Shakeri, Rezgar and
420                  Thompson, Jeremy L and
421                  Tomov, Stanimire and
422                  Wright III, James},
423  title        = {{libCEED} User Manual},
424  month        = dec,
425  year         = 2022,
426  publisher    = {Zenodo},
427  version      = {0.11.0},
428  doi          = {10.5281/zenodo.7480454}
429}
430```
431
432For libCEED's Python interface please cite:
433
434```bibtex
435@InProceedings{libceed-paper-proc-scipy-2020,
436  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
437  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
438  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
439  pages     = {85 - 90},
440  year      = {2020},
441  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
442  doi       = {10.25080/Majora-342d178e-00c}
443}
444```
445
446The BibTeX entries for these references can be found in the `doc/bib/references.bib` file.
447
448## Copyright
449
450The following copyright applies to each file in the CEED software suite, unless otherwise stated in the file:
451
452> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the
453> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
454
455See files LICENSE and NOTICE for details.
456
457[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
458[github-link]: https://github.com/CEED/libCEED/actions
459[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
460[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
461[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
462[codecov-link]: https://codecov.io/gh/CEED/libCEED/
463[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
464[license-link]: https://opensource.org/licenses/BSD-2-Clause
465[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest
466[doc-link]: https://libceed.org/en/latest/?badge=latest
467[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
468[joss-link]: https://doi.org/10.21105/joss.02945
469[binder-badge]: http://mybinder.org/badge_logo.svg
470[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb
471[zenodo-badge]: https://zenodo.org/badge/DOI/10.5281/zenodo.svg
472[zenodo-link]: https://doi.org/10.5281/zenodo.4302736
473