xref: /libCEED/README.md (revision 0be03a92683d319639505fd4b3dce80b3bae318f)
1# libCEED: Efficient Extensible Discretization
2
3[![GitHub Actions][github-badge]][github-link]
4[![GitLab-CI][gitlab-badge]][gitlab-link]
5[![Code coverage][codecov-badge]][codecov-link]
6[![BSD-2-Clause][license-badge]][license-link]
7[![Documentation][doc-badge]][doc-link]
8[![JOSS paper][joss-badge]][joss-link]
9[![Binder][binder-badge]][binder-link]
10
11## Summary and Purpose
12
13libCEED provides fast algebra for element-based discretizations, designed for performance portability, run-time flexibility, and clean embedding in higher level libraries and applications.
14It offers a C99 interface as well as bindings for Fortran, Python, Julia, and Rust.
15While our focus is on high-order finite elements, the approach is mostly algebraic and thus applicable to other discretizations in factored form, as explained in the [user manual](https://libceed.org/en/latest/) and API implementation portion of the [documentation](https://libceed.org/en/latest/api/).
16
17One of the challenges with high-order methods is that a global sparse matrix is no longer a good representation of a high-order linear operator, both with respect to the FLOPs needed for its evaluation, as well as the memory transfer needed for a matvec.
18Thus, high-order methods require a new "format" that still represents a linear (or more generally non-linear) operator, but not through a sparse matrix.
19
20The goal of libCEED is to propose such a format, as well as supporting implementations and data structures, that enable efficient operator evaluation on a variety of computational device types (CPUs, GPUs, etc.).
21This new operator description is based on algebraically [factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition), which is easy to incorporate in a wide variety of applications, without significant refactoring of their own discretization infrastructure.
22
23The repository is part of the [CEED software suite](http://ceed.exascaleproject.org/software/), a collection of software benchmarks, miniapps, libraries and APIs for efficient exascale discretizations based on high-order finite element and spectral element methods.
24See <http://github.com/ceed> for more information and source code availability.
25
26The CEED research is supported by the [Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a [capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative.
27
28For more details on the CEED API see the [user manual](https://libceed.org/en/latest/).
29
30% gettingstarted-inclusion-marker
31
32## Building
33
34The CEED library, `libceed`, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces.
35It can be built using:
36
37```
38make
39```
40
41or, with optimization flags:
42
43```
44make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
45```
46
47These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below).
48
49The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host.
50Support may need to be manually specified via:
51
52```
53make AVX=1
54```
55
56or:
57
58```
59make AVX=0
60```
61
62if your compiler does not support gcc-style options, if you are cross compiling, etc.
63
64To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory to your `make` invocation.
65To enable HIP support, add `HIP_DIR=/opt/rocm` or an appropriate directory.
66To store these or other arguments as defaults for future invocations of `make`, use:
67
68```
69make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2'
70```
71
72which stores these variables in `config.mk`.
73
74## Additional Language Interfaces
75
76The Fortran interface is built alongside the library automatically.
77
78Python users can install using:
79
80```
81pip install libceed
82```
83
84or in a clone of the repository via `pip install .`.
85
86Julia users can install using:
87
88```
89$ julia
90julia> ]
91pkg> add LibCEED
92```
93
94See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/) for more information.
95
96Rust users can include libCEED via `Cargo.toml`:
97
98```toml
99[dependencies]
100libceed = { git = "https://github.com/CEED/libCEED", branch = "main" }
101```
102
103See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
104
105## Testing
106
107The test suite produces [TAP](https://testanything.org) output and is run by:
108
109```
110make test
111```
112
113or, using the `prove` tool distributed with Perl (recommended):
114
115```
116make prove
117```
118
119## Backends
120
121There are multiple supported backends, which can be selected at runtime in the examples:
122
123| CEED resource              | Backend                                           | Deterministic Capable |
124| :---                       | :---                                              | :---:                 |
125||
126| **CPU Native**             |
127| `/cpu/self/ref/serial`     | Serial reference implementation                   | Yes                   |
128| `/cpu/self/ref/blocked`    | Blocked reference implementation                  | Yes                   |
129| `/cpu/self/opt/serial`     | Serial optimized C implementation                 | Yes                   |
130| `/cpu/self/opt/blocked`    | Blocked optimized C implementation                | Yes                   |
131| `/cpu/self/avx/serial`     | Serial AVX implementation                         | Yes                   |
132| `/cpu/self/avx/blocked`    | Blocked AVX implementation                        | Yes                   |
133||
134| **CPU Valgrind**           |
135| `/cpu/self/memcheck/*`     | Memcheck backends, undefined value checks         | Yes                   |
136||
137| **CPU LIBXSMM**            |
138| `/cpu/self/xsmm/serial`    | Serial LIBXSMM implementation                     | Yes                   |
139| `/cpu/self/xsmm/blocked`   | Blocked LIBXSMM implementation                    | Yes                   |
140||
141| **CUDA Native**            |
142| `/gpu/cuda/ref`            | Reference pure CUDA kernels                       | Yes                   |
143| `/gpu/cuda/shared`         | Optimized pure CUDA kernels using shared memory   | Yes                   |
144| `/gpu/cuda/gen`            | Optimized pure CUDA kernels using code generation | No                    |
145||
146| **HIP Native**             |
147| `/gpu/hip/ref`             | Reference pure HIP kernels                        | Yes                   |
148| `/gpu/hip/shared`          | Optimized pure HIP kernels using shared memory    | Yes                   |
149| `/gpu/hip/gen`             | Optimized pure HIP kernels using code generation  | No                    |
150||
151| **MAGMA**                  |
152| `/gpu/cuda/magma`          | CUDA MAGMA kernels                                | No                    |
153| `/gpu/cuda/magma/det`      | CUDA MAGMA kernels                                | Yes                   |
154| `/gpu/hip/magma`           | HIP MAGMA kernels                                 | No                    |
155| `/gpu/hip/magma/det`       | HIP MAGMA kernels                                 | Yes                   |
156||
157| **OCCA**                   |
158| `/*/occa`                  | Selects backend based on available OCCA modes     | Yes                   |
159| `/cpu/self/occa`           | OCCA backend with serial CPU kernels              | Yes                   |
160| `/cpu/openmp/occa`         | OCCA backend with OpenMP kernels                  | Yes                   |
161| `/cpu/dpcpp/occa`          | OCCA backend with CPC++ kernels                   | Yes                   |
162| `/gpu/cuda/occa`           | OCCA backend with CUDA kernels                    | Yes                   |
163| `/gpu/hip/occa`~           | OCCA backend with HIP kernels                     | Yes                   |
164
165The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes with a smaller number of high order elements.
166The `/cpu/self/*/blocked` backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements.
167
168The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
169
170The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
171
172The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
173
174The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool to help verify that user QFunctions have no undefined values.
175To use, run your code with Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`.
176A 'development' or 'debugging' version of Valgrind with headers is required to use this backend.
177This backend can be run in serial or blocked mode and defaults to running in the serial mode if `/cpu/self/memcheck` is selected at runtime.
178
179The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU performance.
180If linking MKL and LIBXSMM is desired but the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be forced by setting the environment variable `MKL=1`.
181
182The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
183
184The `/gpu/hip/*` backends provide GPU performance strictly using HIP.
185They are based on the `/gpu/cuda/*` backends.
186ROCm version 4.2 or newer is required.
187
188The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
189To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
190By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent directory, or set `MAGMA_DIR` to the proper location.
191MAGMA version 2.5.0 or newer is required.
192Currently, each MAGMA library installation is only built for either CUDA or HIP.
193The corresponding set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built for the version of the MAGMA library found in `MAGMA_DIR`.
194
195Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` after the resource name.
196For example:
197
198> - `/gpu/cuda/gen:device_id=1`
199
200The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide cross platform performance.
201To enable the OCCA backend, the environment variable `OCCA_DIR` must point to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, `OCCA_DIR` is set to `../occa`).
202OCCA version 1.4.0 or newer is required.
203
204Users can pass specific OCCA device properties after setting the CEED resource.
205For example:
206
207> - `"/*/occa:mode='CUDA',device_id=0"`
208
209Bit-for-bit reproducibility is important in some applications.
210However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
211The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
212
213## Examples
214
215libCEED comes with several examples of its usage, ranging from standalone C codes in the `/examples/ceed` directory to examples based on external packages, such as MFEM, PETSc, and Nek5000.
216Nek5000 v18.0 or greater is required.
217
218To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and `NEK5K_DIR` variables and run:
219
220```
221cd examples/
222```
223
224% running-examples-inclusion-marker
225
226```console
227# libCEED examples on CPU and GPU
228cd ceed/
229make
230./ex1-volume -ceed /cpu/self
231./ex1-volume -ceed /gpu/cuda
232./ex2-surface -ceed /cpu/self
233./ex2-surface -ceed /gpu/cuda
234cd ..
235
236# MFEM+libCEED examples on CPU and GPU
237cd mfem/
238make
239./bp1 -ceed /cpu/self -no-vis
240./bp3 -ceed /gpu/cuda -no-vis
241cd ..
242
243# Nek5000+libCEED examples on CPU and GPU
244cd nek/
245make
246./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
247./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
248cd ..
249
250# PETSc+libCEED examples on CPU and GPU
251cd petsc/
252make
253./bps -problem bp1 -ceed /cpu/self
254./bps -problem bp2 -ceed /gpu/cuda
255./bps -problem bp3 -ceed /cpu/self
256./bps -problem bp4 -ceed /gpu/cuda
257./bps -problem bp5 -ceed /cpu/self
258./bps -problem bp6 -ceed /gpu/cuda
259cd ..
260
261cd petsc/
262make
263./bpsraw -problem bp1 -ceed /cpu/self
264./bpsraw -problem bp2 -ceed /gpu/cuda
265./bpsraw -problem bp3 -ceed /cpu/self
266./bpsraw -problem bp4 -ceed /gpu/cuda
267./bpsraw -problem bp5 -ceed /cpu/self
268./bpsraw -problem bp6 -ceed /gpu/cuda
269cd ..
270
271cd petsc/
272make
273./bpssphere -problem bp1 -ceed /cpu/self
274./bpssphere -problem bp2 -ceed /gpu/cuda
275./bpssphere -problem bp3 -ceed /cpu/self
276./bpssphere -problem bp4 -ceed /gpu/cuda
277./bpssphere -problem bp5 -ceed /cpu/self
278./bpssphere -problem bp6 -ceed /gpu/cuda
279cd ..
280
281cd petsc/
282make
283./area -problem cube -ceed /cpu/self -degree 3
284./area -problem cube -ceed /gpu/cuda -degree 3
285./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
286./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
287
288cd fluids/
289make
290./navierstokes -ceed /cpu/self -degree 1
291./navierstokes -ceed /gpu/cuda -degree 1
292cd ..
293
294cd solids/
295make
296./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
297./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
298cd ..
299```
300
301For the last example shown, sample meshes to be used in place of `[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
302
303The above code assumes a GPU-capable machine with the CUDA backends enabled.
304Depending on the available backends, other CEED resource specifiers can be provided with the `-ceed` option.
305Other command line arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
306
307% benchmarks-marker
308
309## Benchmarks
310
311A sequence of benchmarks for all enabled backends can be run using:
312
313```
314make benchmarks
315```
316
317The results from the benchmarks are stored inside the `benchmarks/` directory and they can be viewed using the commands (requires python with matplotlib):
318
319```
320cd benchmarks
321python postprocess-plot.py petsc-bps-bp1-*-output.txt
322python postprocess-plot.py petsc-bps-bp3-*-output.txt
323```
324
325Using the `benchmarks` target runs a comprehensive set of benchmarks which may take some time to run.
326Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
327
328For more details about the benchmarks, see the `benchmarks/README.md` file.
329
330## Install
331
332To install libCEED, run:
333
334```
335make install prefix=/path/to/install/dir
336```
337
338or (e.g., if creating packages):
339
340```
341make install prefix=/usr DESTDIR=/packaging/path
342```
343
344To build and install in separate steps, run:
345
346```
347make for_install=1 prefix=/path/to/install/dir
348make install prefix=/path/to/install/dir
349```
350
351The usual variables like `CC` and `CFLAGS` are used, and optimization flags for all languages can be set using the likes of `OPT='-O3 -march=native'`.
352Use `STATIC=1` to build static libraries (`libceed.a`).
353
354To install libCEED for Python, run:
355
356```
357pip install libceed
358```
359
360with the desired setuptools options, such as `--user`.
361
362### pkg-config
363
364In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) file that can be used to easily compile and link.
365[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if `$prefix` is a standard location or you set the environment variable `PKG_CONFIG_PATH`:
366
367```
368cc `pkg-config --cflags --libs ceed` -o myapp myapp.c
369```
370
371will build `myapp` with libCEED.
372This can be used with the source or installed directories.
373Most build systems have support for pkg-config.
374
375## Contact
376
377You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
378
379## How to Cite
380
381If you utilize libCEED please cite:
382
383```
384@article{libceed-joss-paper,
385  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
386  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
387  journal      = {Journal of Open Source Software},
388  year         = {2021},
389  publisher    = {The Open Journal},
390  volume       = {6},
391  number       = {63},
392  pages        = {2945},
393  doi          = {10.21105/joss.02945}
394}
395
396@misc{libceed-user-manual,
397  author       = {Abdelfattah, Ahmad and
398                  Barra, Valeria and
399                  Beams, Natalie and
400                  Brown, Jed and
401                  Camier, Jean-Sylvain and
402                  Dobrev, Veselin and
403                  Dudouit, Yohann and
404                  Ghaffari, Leila and
405                  Kolev, Tzanio and
406                  Medina, David and
407                  Pazner, Will and
408                  Ratnayaka, Thilina and
409                  Thompson, Jeremy L and
410                  Tomov, Stanimire},
411  title        = {{libCEED} User Manual},
412  month        = jul,
413  year         = 2021,
414  publisher    = {Zenodo},
415  version      = {0.9.0},
416  doi          = {10.5281/zenodo.5077489}
417}
418```
419
420For libCEED's Python interface please cite:
421
422```
423@InProceedings{libceed-paper-proc-scipy-2020,
424  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
425  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
426  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
427  pages     = {85 - 90},
428  year      = {2020},
429  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
430  doi       = {10.25080/Majora-342d178e-00c}
431}
432```
433
434The BiBTeX entries for these references can be found in the `doc/bib/references.bib` file.
435
436## Copyright
437
438The following copyright applies to each file in the CEED software suite, unless otherwise stated in the file:
439
440> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the
441> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
442
443See files LICENSE and NOTICE for details.
444
445[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
446[github-link]: https://github.com/CEED/libCEED/actions
447[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
448[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
449[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
450[codecov-link]: https://codecov.io/gh/CEED/libCEED/
451[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
452[license-link]: https://opensource.org/licenses/BSD-2-Clause
453[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest
454[doc-link]: https://libceed.org/en/latest/?badge=latest
455[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
456[joss-link]: https://doi.org/10.21105/joss.02945
457[binder-badge]: http://mybinder.org/badge_logo.svg
458[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb
459