xref: /libCEED/README.md (revision 2288fb5222bbca88523f94a06377dc76b8b46264)
1# libCEED: Efficient Extensible Discretization
2
3[![GitHub Actions][github-badge]][github-link]
4[![GitLab-CI][gitlab-badge]][gitlab-link]
5[![Code coverage][codecov-badge]][codecov-link]
6[![BSD-2-Clause][license-badge]][license-link]
7[![Documentation][doc-badge]][doc-link]
8[![JOSS paper][joss-badge]][joss-link]
9[![Binder][binder-badge]][binder-link]
10
11## Summary and Purpose
12
13libCEED provides fast algebra for element-based discretizations, designed for
14performance portability, run-time flexibility, and clean embedding in higher
15level libraries and applications. It offers a C99 interface as well as bindings
16for Fortran, Python, Julia, and Rust.
17While our focus is on high-order finite elements, the approach is mostly
18algebraic and thus applicable to other discretizations in factored form, as
19explained in the [user manual](https://libceed.org/en/latest/) and
20API implementation portion of the
21[documentation](https://libceed.org/en/latest/api/).
22
23One of the challenges with high-order methods is that a global sparse matrix is
24no longer a good representation of a high-order linear operator, both with
25respect to the FLOPs needed for its evaluation, as well as the memory transfer
26needed for a matvec.  Thus, high-order methods require a new "format" that still
27represents a linear (or more generally non-linear) operator, but not through a
28sparse matrix.
29
30The goal of libCEED is to propose such a format, as well as supporting
31implementations and data structures, that enable efficient operator evaluation
32on a variety of computational device types (CPUs, GPUs, etc.). This new operator
33description is based on algebraically
34[factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition),
35which is easy to incorporate in a wide variety of applications, without significant
36refactoring of their own discretization infrastructure.
37
38The repository is part of the
39[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of
40software benchmarks, miniapps, libraries and APIs for efficient exascale
41discretizations based on high-order finite element and spectral element methods.
42See <http://github.com/ceed> for more information and source code availability.
43
44The CEED research is supported by the
45[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project)
46(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy
47organizations (Office of Science and the National Nuclear Security
48Administration) responsible for the planning and preparation of a
49[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including
50software, applications, hardware, advanced system engineering and early testbed
51platforms, in support of the nation’s exascale computing imperative.
52
53For more details on the CEED API see the [user manual](https://libceed.org/en/latest/).
54
55% gettingstarted-inclusion-marker
56
57## Building
58
59The CEED library, `libceed`, is a C99 library with no required dependencies, and
60with Fortran, Python, Julia, and Rust interfaces.  It can be built using:
61
62```
63make
64```
65
66or, with optimization flags:
67
68```
69make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
70```
71
72These optimization flags are used by all languages (C, C++, Fortran) and this
73makefile variable can also be set for testing and examples (below).
74
75The library attempts to automatically detect support for the AVX
76instruction set using gcc-style compiler options for the host.
77Support may need to be manually specified via:
78
79```
80make AVX=1
81```
82
83or:
84
85```
86make AVX=0
87```
88
89if your compiler does not support gcc-style options, if you are cross
90compiling, etc.
91
92To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory
93to your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or
94an appropriate directory. To store these or other arguments as defaults for
95future invocations of `make`, use:
96
97```
98make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2'
99```
100
101which stores these variables in `config.mk`.
102
103## Additional Language Interfaces
104
105The Fortran interface is built alongside the library automatically.
106
107Python users can install using:
108
109```
110pip install libceed
111```
112
113or in a clone of the repository via `pip install .`.
114
115Julia users can install using:
116
117```
118$ julia
119julia> ]
120pkg> add LibCEED
121```
122
123See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/)
124for more information.
125
126Rust users can include libCEED via `Cargo.toml`:
127
128```toml
129[dependencies]
130libceed = { git = "https://github.com/CEED/libCEED", branch = "main" }
131```
132
133See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
134
135## Testing
136
137The test suite produces [TAP](https://testanything.org) output and is run by:
138
139```
140make test
141```
142
143or, using the `prove` tool distributed with Perl (recommended):
144
145```
146make prove
147```
148
149## Backends
150
151There are multiple supported backends, which can be selected at runtime in the examples:
152
153| CEED resource              | Backend                                           | Deterministic Capable |
154| :---                       | :---                                              | :---:                 |
155||
156| **CPU Native**             |
157| `/cpu/self/ref/serial`     | Serial reference implementation                   | Yes                   |
158| `/cpu/self/ref/blocked`    | Blocked reference implementation                  | Yes                   |
159| `/cpu/self/opt/serial`     | Serial optimized C implementation                 | Yes                   |
160| `/cpu/self/opt/blocked`    | Blocked optimized C implementation                | Yes                   |
161| `/cpu/self/avx/serial`     | Serial AVX implementation                         | Yes                   |
162| `/cpu/self/avx/blocked`    | Blocked AVX implementation                        | Yes                   |
163||
164| **CPU Valgrind**           |
165| `/cpu/self/memcheck/*`     | Memcheck backends, undefined value checks         | Yes                   |
166||
167| **CPU LIBXSMM**            |
168| `/cpu/self/xsmm/serial`    | Serial LIBXSMM implementation                     | Yes                   |
169| `/cpu/self/xsmm/blocked`   | Blocked LIBXSMM implementation                    | Yes                   |
170||
171| **CUDA Native**            |
172| `/gpu/cuda/ref`            | Reference pure CUDA kernels                       | Yes                   |
173| `/gpu/cuda/shared`         | Optimized pure CUDA kernels using shared memory   | Yes                   |
174| `/gpu/cuda/gen`            | Optimized pure CUDA kernels using code generation | No                    |
175||
176| **HIP Native**             |
177| `/gpu/hip/ref`             | Reference pure HIP kernels                        | Yes                   |
178| `/gpu/hip/shared`          | Optimized pure HIP kernels using shared memory    | Yes                   |
179| `/gpu/hip/gen`             | Optimized pure HIP kernels using code generation  | No                    |
180||
181| **MAGMA**                  |
182| `/gpu/cuda/magma`          | CUDA MAGMA kernels                                | No                    |
183| `/gpu/cuda/magma/det`      | CUDA MAGMA kernels                                | Yes                   |
184| `/gpu/hip/magma`           | HIP MAGMA kernels                                 | No                    |
185| `/gpu/hip/magma/det`       | HIP MAGMA kernels                                 | Yes                   |
186||
187| **OCCA**                   |
188| `/*/occa`                  | Selects backend based on available OCCA modes     | Yes                   |
189| `/cpu/self/occa`           | OCCA backend with serial CPU kernels              | Yes                   |
190| `/cpu/openmp/occa`         | OCCA backend with OpenMP kernels                  | Yes                   |
191| `/gpu/cuda/occa`           | OCCA backend with CUDA kernels                    | Yes                   |
192| `/gpu/hip/occa`~           | OCCA backend with HIP kernels                     | Yes                   |
193
194The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes
195with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process
196blocked batches of eight interlaced elements and are intended for meshes with higher numbers
197of elements.
198
199The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
200
201The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
202
203The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
204
205The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool
206to help verify that user QFunctions have no undefined values. To use, run your code with
207Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A
208'development' or 'debugging' version of Valgrind with headers is required to use this backend.
209This backend can be run in serial or blocked mode and defaults to running in the serial mode
210if `/cpu/self/memcheck` is selected at runtime.
211
212The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package
213to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but
214the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be
215forced by setting the environment variable `MKL=1`.
216
217The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
218
219The `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on
220the `/gpu/cuda/*` backends.  ROCm version 4.2 or newer is required.
221
222The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
223To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level
224MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
225By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends
226with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent
227directory, or set `MAGMA_DIR` to the proper location.  MAGMA version 2.5.0 or newer is required.
228Currently, each MAGMA library installation is only built for either CUDA or HIP.  The corresponding
229set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built
230for the version of the MAGMA library found in `MAGMA_DIR`.
231
232Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#`
233after the resource name.  For example:
234
235> - `/gpu/cuda/gen:device_id=1`
236
237The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide
238cross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point
239to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default,
240`OCCA_DIR` is set to `../occa`).
241
242Additionally, users can pass specific OCCA device properties after setting the CEED resource.
243For example:
244
245> - `"/*/occa:mode='CUDA',device_id=0"`
246
247Bit-for-bit reproducibility is important in some applications.
248However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
249The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
250
251## Examples
252
253libCEED comes with several examples of its usage, ranging from standalone C
254codes in the `/examples/ceed` directory to examples based on external packages,
255such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required.
256
257To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and
258`NEK5K_DIR` variables and run:
259
260```
261cd examples/
262```
263
264% running-examples-inclusion-marker
265
266```console
267# libCEED examples on CPU and GPU
268cd ceed/
269make
270./ex1-volume -ceed /cpu/self
271./ex1-volume -ceed /gpu/cuda
272./ex2-surface -ceed /cpu/self
273./ex2-surface -ceed /gpu/cuda
274cd ..
275
276# MFEM+libCEED examples on CPU and GPU
277cd mfem/
278make
279./bp1 -ceed /cpu/self -no-vis
280./bp3 -ceed /gpu/cuda -no-vis
281cd ..
282
283# Nek5000+libCEED examples on CPU and GPU
284cd nek/
285make
286./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
287./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
288cd ..
289
290# PETSc+libCEED examples on CPU and GPU
291cd petsc/
292make
293./bps -problem bp1 -ceed /cpu/self
294./bps -problem bp2 -ceed /gpu/cuda
295./bps -problem bp3 -ceed /cpu/self
296./bps -problem bp4 -ceed /gpu/cuda
297./bps -problem bp5 -ceed /cpu/self
298./bps -problem bp6 -ceed /gpu/cuda
299cd ..
300
301cd petsc/
302make
303./bpsraw -problem bp1 -ceed /cpu/self
304./bpsraw -problem bp2 -ceed /gpu/cuda
305./bpsraw -problem bp3 -ceed /cpu/self
306./bpsraw -problem bp4 -ceed /gpu/cuda
307./bpsraw -problem bp5 -ceed /cpu/self
308./bpsraw -problem bp6 -ceed /gpu/cuda
309cd ..
310
311cd petsc/
312make
313./bpssphere -problem bp1 -ceed /cpu/self
314./bpssphere -problem bp2 -ceed /gpu/cuda
315./bpssphere -problem bp3 -ceed /cpu/self
316./bpssphere -problem bp4 -ceed /gpu/cuda
317./bpssphere -problem bp5 -ceed /cpu/self
318./bpssphere -problem bp6 -ceed /gpu/cuda
319cd ..
320
321cd petsc/
322make
323./area -problem cube -ceed /cpu/self -degree 3
324./area -problem cube -ceed /gpu/cuda -degree 3
325./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
326./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
327
328cd fluids/
329make
330./navierstokes -ceed /cpu/self -degree 1
331./navierstokes -ceed /gpu/cuda -degree 1
332cd ..
333
334cd solids/
335make
336./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
337./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
338cd ..
339```
340
341For the last example shown, sample meshes to be used in place of
342`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
343
344The above code assumes a GPU-capable machine with the OCCA backend
345enabled. Depending on the available backends, other CEED resource
346specifiers can be provided with the `-ceed` option. Other command line
347arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
348
349% benchmarks-marker
350
351## Benchmarks
352
353A sequence of benchmarks for all enabled backends can be run using:
354
355```
356make benchmarks
357```
358
359The results from the benchmarks are stored inside the `benchmarks/` directory
360and they can be viewed using the commands (requires python with matplotlib):
361
362```
363cd benchmarks
364python postprocess-plot.py petsc-bps-bp1-*-output.txt
365python postprocess-plot.py petsc-bps-bp3-*-output.txt
366```
367
368Using the `benchmarks` target runs a comprehensive set of benchmarks which may
369take some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
370
371For more details about the benchmarks, see the `benchmarks/README.md` file.
372
373## Install
374
375To install libCEED, run:
376
377```
378make install prefix=/path/to/install/dir
379```
380
381or (e.g., if creating packages):
382
383```
384make install prefix=/usr DESTDIR=/packaging/path
385```
386
387To build and install in separate steps, run:
388
389```
390make for_install=1 prefix=/path/to/install/dir
391make install prefix=/path/to/install/dir
392```
393
394The usual variables like `CC` and `CFLAGS` are used, and optimization flags
395for all languages can be set using the likes of `OPT='-O3 -march=native'`. Use
396`STATIC=1` to build static libraries (`libceed.a`).
397
398To install libCEED for Python, run:
399
400```
401pip install libceed
402```
403
404with the desired setuptools options, such as `--user`.
405
406### pkg-config
407
408In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config)
409file that can be used to easily compile and link.
410[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if
411`$prefix` is a standard location or you set the environment variable
412`PKG_CONFIG_PATH`:
413
414```
415cc `pkg-config --cflags --libs ceed` -o myapp myapp.c
416```
417
418will build `myapp` with libCEED.  This can be used with the source or
419installed directories.  Most build systems have support for pkg-config.
420
421## Contact
422
423You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov)
424or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
425
426## How to Cite
427
428If you utilize libCEED please cite:
429
430```
431@article{libceed-joss-paper,
432  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
433  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
434  journal      = {Journal of Open Source Software},
435  year         = {2021},
436  publisher    = {The Open Journal},
437  volume       = {6},
438  number       = {63},
439  pages        = {2945},
440  doi          = {10.21105/joss.02945}
441}
442
443@misc{libceed-user-manual,
444  author       = {Abdelfattah, Ahmad and
445                  Barra, Valeria and
446                  Beams, Natalie and
447                  Brown, Jed and
448                  Camier, Jean-Sylvain and
449                  Dobrev, Veselin and
450                  Dudouit, Yohann and
451                  Ghaffari, Leila and
452                  Kolev, Tzanio and
453                  Medina, David and
454                  Pazner, Will and
455                  Ratnayaka, Thilina and
456                  Thompson, Jeremy L and
457                  Tomov, Stanimire},
458  title        = {{libCEED} User Manual},
459  month        = jul,
460  year         = 2021,
461  publisher    = {Zenodo},
462  version      = {0.9.0},
463  doi          = {10.5281/zenodo.5077489}
464}
465```
466
467For libCEED's Python interface please cite:
468
469```
470@InProceedings{libceed-paper-proc-scipy-2020,
471  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
472  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
473  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
474  pages     = {85 - 90},
475  year      = {2020},
476  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
477  doi       = {10.25080/Majora-342d178e-00c}
478}
479```
480
481The BiBTeX entries for these references can be found in the
482`doc/bib/references.bib` file.
483
484## Copyright
485
486The following copyright applies to each file in the CEED software suite, unless
487otherwise stated in the file:
488
489> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the
490> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
491
492See files LICENSE and NOTICE for details.
493
494[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
495[github-link]: https://github.com/CEED/libCEED/actions
496[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
497[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
498[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
499[codecov-link]: https://codecov.io/gh/CEED/libCEED/
500[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
501[license-link]: https://opensource.org/licenses/BSD-2-Clause
502[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest
503[doc-link]: https://libceed.org/en/latest/?badge=latest
504[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
505[joss-link]: https://doi.org/10.21105/joss.02945
506[binder-badge]: http://mybinder.org/badge_logo.svg
507[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb
508