xref: /libCEED/README.md (revision 023b8a51676743c24bb03c40af89971dbec6e8fb)
1# libCEED: Efficient Extensible Discretization
2
3[![GitHub Actions][github-badge]][github-link]
4[![GitLab-CI][gitlab-badge]][gitlab-link]
5[![Code coverage][codecov-badge]][codecov-link]
6[![BSD-2-Clause][license-badge]][license-link]
7[![Documentation][doc-badge]][doc-link]
8[![JOSS paper][joss-badge]][joss-link]
9[![Binder][binder-badge]][binder-link]
10
11## Summary and Purpose
12
13libCEED provides fast algebra for element-based discretizations, designed for performance portability, run-time flexibility, and clean embedding in higher level libraries and applications.
14It offers a C99 interface as well as bindings for Fortran, Python, Julia, and Rust.
15While our focus is on high-order finite elements, the approach is mostly algebraic and thus applicable to other discretizations in factored form, as explained in the [user manual](https://libceed.org/en/latest/) and API implementation portion of the [documentation](https://libceed.org/en/latest/api/).
16
17One of the challenges with high-order methods is that a global sparse matrix is no longer a good representation of a high-order linear operator, both with respect to the FLOPs needed for its evaluation, as well as the memory transfer needed for a matvec.
18Thus, high-order methods require a new "format" that still represents a linear (or more generally non-linear) operator, but not through a sparse matrix.
19
20The goal of libCEED is to propose such a format, as well as supporting implementations and data structures, that enable efficient operator evaluation on a variety of computational device types (CPUs, GPUs, etc.).
21This new operator description is based on algebraically [factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition), which is easy to incorporate in a wide variety of applications, without significant refactoring of their own discretization infrastructure.
22
23The repository is part of the [CEED software suite](http://ceed.exascaleproject.org/software/), a collection of software benchmarks, miniapps, libraries and APIs for efficient exascale discretizations based on high-order finite element and spectral element methods.
24See <http://github.com/ceed> for more information and source code availability.
25
26The CEED research is supported by the [Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a [capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative.
27
28For more details on the CEED API see the [user manual](https://libceed.org/en/latest/).
29
30% gettingstarted-inclusion-marker
31
32## Building
33
34The CEED library, `libceed`, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces.
35It can be built using:
36
37```console
38$ make
39```
40
41or, with optimization flags:
42
43```console
44$ make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
45```
46
47These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below).
48
49The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host.
50Support may need to be manually specified via:
51
52```console
53$ make AVX=1
54```
55
56or:
57
58```console
59$ make AVX=0
60```
61
62if your compiler does not support gcc-style options, if you are cross compiling, etc.
63
64To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory to your `make` invocation.
65To enable HIP support, add `ROCM_DIR=/opt/rocm` or an appropriate directory.
66To store these or other arguments as defaults for future invocations of `make`, use:
67
68```console
69$ make configure CUDA_DIR=/usr/local/cuda ROCM_DIR=/opt/rocm OPT='-O3 -march=znver2'
70```
71
72which stores these variables in `config.mk`.
73
74### WebAssembly
75
76libCEED can be built for WASM using [Emscripten](https://emscripten.org). For example, one can build the library and run a standalone WASM executable using
77
78``` console
79$ emmake make build/ex2-surface.wasm
80$ wasmer build/ex2-surface.wasm -- -s 200000
81```
82
83## Additional Language Interfaces
84
85The Fortran interface is built alongside the library automatically.
86
87Python users can install using:
88
89```console
90$ pip install libceed
91```
92
93or in a clone of the repository via `pip install .`.
94
95Julia users can install using:
96
97```console
98$ julia
99julia> ]
100pkg> add LibCEED
101```
102
103See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/) for more information.
104
105Rust users can include libCEED via `Cargo.toml`:
106
107```toml
108[dependencies]
109libceed = "0.11.0"
110```
111
112See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
113
114## Testing
115
116The test suite produces [TAP](https://testanything.org) output and is run by:
117
118```console
119$ make test
120```
121
122or, using the `prove` tool distributed with Perl (recommended):
123
124```console
125$ make prove
126```
127
128## Backends
129
130There are multiple supported backends, which can be selected at runtime in the examples:
131
132| CEED resource              | Backend                                           | Deterministic Capable |
133| :---                       | :---                                              | :---:                 |
134||
135| **CPU Native**             |
136| `/cpu/self/ref/serial`     | Serial reference implementation                   | Yes                   |
137| `/cpu/self/ref/blocked`    | Blocked reference implementation                  | Yes                   |
138| `/cpu/self/opt/serial`     | Serial optimized C implementation                 | Yes                   |
139| `/cpu/self/opt/blocked`    | Blocked optimized C implementation                | Yes                   |
140| `/cpu/self/avx/serial`     | Serial AVX implementation                         | Yes                   |
141| `/cpu/self/avx/blocked`    | Blocked AVX implementation                        | Yes                   |
142||
143| **CPU Valgrind**           |
144| `/cpu/self/memcheck/*`     | Memcheck backends, undefined value checks         | Yes                   |
145||
146| **CPU LIBXSMM**            |
147| `/cpu/self/xsmm/serial`    | Serial LIBXSMM implementation                     | Yes                   |
148| `/cpu/self/xsmm/blocked`   | Blocked LIBXSMM implementation                    | Yes                   |
149||
150| **CUDA Native**            |
151| `/gpu/cuda/ref`            | Reference pure CUDA kernels                       | Yes                   |
152| `/gpu/cuda/shared`         | Optimized pure CUDA kernels using shared memory   | Yes                   |
153| `/gpu/cuda/gen`            | Optimized pure CUDA kernels using code generation | No                    |
154||
155| **HIP Native**             |
156| `/gpu/hip/ref`             | Reference pure HIP kernels                        | Yes                   |
157| `/gpu/hip/shared`          | Optimized pure HIP kernels using shared memory    | Yes                   |
158| `/gpu/hip/gen`             | Optimized pure HIP kernels using code generation  | No                    |
159||
160| **MAGMA**                  |
161| `/gpu/cuda/magma`          | CUDA MAGMA kernels                                | No                    |
162| `/gpu/cuda/magma/det`      | CUDA MAGMA kernels                                | Yes                   |
163| `/gpu/hip/magma`           | HIP MAGMA kernels                                 | No                    |
164| `/gpu/hip/magma/det`       | HIP MAGMA kernels                                 | Yes                   |
165||
166| **OCCA**                   |
167| `/*/occa`                  | Selects backend based on available OCCA modes     | Yes                   |
168| `/cpu/self/occa`           | OCCA backend with serial CPU kernels              | Yes                   |
169| `/cpu/openmp/occa`         | OCCA backend with OpenMP kernels                  | Yes                   |
170| `/cpu/dpcpp/occa`          | OCCA backend with CPC++ kernels                   | Yes                   |
171| `/gpu/cuda/occa`           | OCCA backend with CUDA kernels                    | Yes                   |
172| `/gpu/hip/occa`~           | OCCA backend with HIP kernels                     | Yes                   |
173
174The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes with a smaller number of high order elements.
175The `/cpu/self/*/blocked` backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements.
176
177The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
178
179The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
180
181The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
182
183The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool to help verify that user QFunctions have no undefined values.
184To use, run your code with Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`.
185A 'development' or 'debugging' version of Valgrind with headers is required to use this backend.
186This backend can be run in serial or blocked mode and defaults to running in the serial mode if `/cpu/self/memcheck` is selected at runtime.
187
188The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU performance.
189If linking MKL and LIBXSMM is desired but the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be forced by setting the environment variable `MKL=1`.
190
191The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
192
193The `/gpu/hip/*` backends provide GPU performance strictly using HIP.
194They are based on the `/gpu/cuda/*` backends.
195ROCm version 4.2 or newer is required.
196
197The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
198To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
199By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent directory, or set `MAGMA_DIR` to the proper location.
200MAGMA version 2.5.0 or newer is required.
201Currently, each MAGMA library installation is only built for either CUDA or HIP.
202The corresponding set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built for the version of the MAGMA library found in `MAGMA_DIR`.
203
204Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` after the resource name.
205For example:
206
207> - `/gpu/cuda/gen:device_id=1`
208
209The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide cross platform performance.
210To enable the OCCA backend, the environment variable `OCCA_DIR` must point to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, `OCCA_DIR` is set to `../occa`).
211OCCA version 1.4.0 or newer is required.
212
213Users can pass specific OCCA device properties after setting the CEED resource.
214For example:
215
216> - `"/*/occa:mode='CUDA',device_id=0"`
217
218Bit-for-bit reproducibility is important in some applications.
219However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
220The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
221
222## Examples
223
224libCEED comes with several examples of its usage, ranging from standalone C codes in the `/examples/ceed` directory to examples based on external packages, such as MFEM, PETSc, and Nek5000.
225Nek5000 v18.0 or greater is required.
226
227To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and `NEK5K_DIR` variables and run:
228
229```console
230$ cd examples/
231```
232
233% running-examples-inclusion-marker
234
235```console
236# libCEED examples on CPU and GPU
237$ cd ceed/
238$ make
239$ ./ex1-volume -ceed /cpu/self
240$ ./ex1-volume -ceed /gpu/cuda
241$ ./ex2-surface -ceed /cpu/self
242$ ./ex2-surface -ceed /gpu/cuda
243$ cd ..
244
245# MFEM+libCEED examples on CPU and GPU
246$ cd mfem/
247$ make
248$ ./bp1 -ceed /cpu/self -no-vis
249$ ./bp3 -ceed /gpu/cuda -no-vis
250$ cd ..
251
252# Nek5000+libCEED examples on CPU and GPU
253$ cd nek/
254$ make
255$ ./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
256$ ./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
257$ cd ..
258
259# PETSc+libCEED examples on CPU and GPU
260$ cd petsc/
261$ make
262$ ./bps -problem bp1 -ceed /cpu/self
263$ ./bps -problem bp2 -ceed /gpu/cuda
264$ ./bps -problem bp3 -ceed /cpu/self
265$ ./bps -problem bp4 -ceed /gpu/cuda
266$ ./bps -problem bp5 -ceed /cpu/self
267$ ./bps -problem bp6 -ceed /gpu/cuda
268$ cd ..
269
270$ cd petsc/
271$ make
272$ ./bpsraw -problem bp1 -ceed /cpu/self
273$ ./bpsraw -problem bp2 -ceed /gpu/cuda
274$ ./bpsraw -problem bp3 -ceed /cpu/self
275$ ./bpsraw -problem bp4 -ceed /gpu/cuda
276$ ./bpsraw -problem bp5 -ceed /cpu/self
277$ ./bpsraw -problem bp6 -ceed /gpu/cuda
278$ cd ..
279
280$ cd petsc/
281$ make
282$ ./bpssphere -problem bp1 -ceed /cpu/self
283$ ./bpssphere -problem bp2 -ceed /gpu/cuda
284$ ./bpssphere -problem bp3 -ceed /cpu/self
285$ ./bpssphere -problem bp4 -ceed /gpu/cuda
286$ ./bpssphere -problem bp5 -ceed /cpu/self
287$ ./bpssphere -problem bp6 -ceed /gpu/cuda
288$ cd ..
289
290$ cd petsc/
291$ make
292$ ./area -problem cube -ceed /cpu/self -degree 3
293$ ./area -problem cube -ceed /gpu/cuda -degree 3
294$ ./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
295$ ./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
296
297$ cd fluids/
298$ make
299$ ./navierstokes -ceed /cpu/self -degree 1
300$ ./navierstokes -ceed /gpu/cuda -degree 1
301$ cd ..
302
303$ cd solids/
304$ make
305$ ./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
306$ ./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
307$ cd ..
308```
309
310For the last example shown, sample meshes to be used in place of `[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
311
312The above code assumes a GPU-capable machine with the CUDA backends enabled.
313Depending on the available backends, other CEED resource specifiers can be provided with the `-ceed` option.
314Other command line arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
315
316% benchmarks-marker
317
318## Benchmarks
319
320A sequence of benchmarks for all enabled backends can be run using:
321
322```console
323$ make benchmarks
324```
325
326The results from the benchmarks are stored inside the `benchmarks/` directory and they can be viewed using the commands (requires python with matplotlib):
327
328```console
329$ cd benchmarks
330$ python postprocess-plot.py petsc-bps-bp1-*-output.txt
331$ python postprocess-plot.py petsc-bps-bp3-*-output.txt
332```
333
334Using the `benchmarks` target runs a comprehensive set of benchmarks which may take some time to run.
335Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
336
337For more details about the benchmarks, see the `benchmarks/README.md` file.
338
339## Install
340
341To install libCEED, run:
342
343```console
344$ make install prefix=/path/to/install/dir
345```
346
347or (e.g., if creating packages):
348
349```console
350$ make install prefix=/usr DESTDIR=/packaging/path
351```
352
353To build and install in separate steps, run:
354
355```console
356$ make for_install=1 prefix=/path/to/install/dir
357$ make install prefix=/path/to/install/dir
358```
359
360The usual variables like `CC` and `CFLAGS` are used, and optimization flags for all languages can be set using the likes of `OPT='-O3 -march=native'`.
361Use `STATIC=1` to build static libraries (`libceed.a`).
362
363To install libCEED for Python, run:
364
365```console
366$ pip install libceed
367```
368
369with the desired setuptools options, such as `--user`.
370
371### pkg-config
372
373In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) file that can be used to easily compile and link.
374[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if `$prefix` is a standard location or you set the environment variable `PKG_CONFIG_PATH`:
375
376```console
377$ cc `pkg-config --cflags --libs ceed` -o myapp myapp.c
378```
379
380will build `myapp` with libCEED.
381This can be used with the source or installed directories.
382Most build systems have support for pkg-config.
383
384## Contact
385
386You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
387
388## How to Cite
389
390If you utilize libCEED please cite:
391
392```bibtex
393@article{libceed-joss-paper,
394  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
395  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
396  journal      = {Journal of Open Source Software},
397  year         = {2021},
398  publisher    = {The Open Journal},
399  volume       = {6},
400  number       = {63},
401  pages        = {2945},
402  doi          = {10.21105/joss.02945}
403}
404```
405
406The archival copy of the libCEED user manual is maintained on [Zenodo](https://doi.org/10.5281/zenodo.4302736).
407To cite the user manual:
408
409```bibtex
410@misc{libceed-user-manual,
411  author       = {Abdelfattah, Ahmad and
412                  Barra, Valeria and
413                  Beams, Natalie and
414                  Brown, Jed and
415                  Camier, Jean-Sylvain and
416                  Dobrev, Veselin and
417                  Dudouit, Yohann and
418                  Ghaffari, Leila and
419                  Kolev, Tzanio and
420                  Medina, David and
421                  Pazner, Will and
422                  Ratnayaka, Thilina and
423                  Shakeri, Rezgar and
424                  Thompson, Jeremy L and
425                  Tomov, Stanimire and
426                  Wright III, James},
427  title        = {{libCEED} User Manual},
428  month        = dec,
429  year         = 2022,
430  publisher    = {Zenodo},
431  version      = {0.11.0},
432  doi          = {10.5281/zenodo.7480454}
433}
434```
435
436For libCEED's Python interface please cite:
437
438```bibtex
439@InProceedings{libceed-paper-proc-scipy-2020,
440  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
441  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
442  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
443  pages     = {85 - 90},
444  year      = {2020},
445  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
446  doi       = {10.25080/Majora-342d178e-00c}
447}
448```
449
450The BibTeX entries for these references can be found in the `doc/bib/references.bib` file.
451
452## Copyright
453
454The following copyright applies to each file in the CEED software suite, unless otherwise stated in the file:
455
456> Copyright (c) 2017-2023, Lawrence Livermore National Security, LLC and other CEED contributors.
457> All rights reserved.
458
459See files LICENSE and NOTICE for details.
460
461[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
462[github-link]: https://github.com/CEED/libCEED/actions
463[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
464[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
465[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
466[codecov-link]: https://codecov.io/gh/CEED/libCEED/
467[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
468[license-link]: https://opensource.org/licenses/BSD-2-Clause
469[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest
470[doc-link]: https://libceed.org/en/latest/?badge=latest
471[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
472[joss-link]: https://doi.org/10.21105/joss.02945
473[binder-badge]: http://mybinder.org/badge_logo.svg
474[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb
475