xref: /libCEED/README.md (revision acf255f533f9b196990ab212a71b6493b7e4ca3c)
1# libCEED: Efficient Extensible Discretization
2
3[![GitHub Actions][github-badge]][github-link]
4[![GitLab-CI][gitlab-badge]][gitlab-link]
5[![Azure Pipelines][azure-badge]][azure-link]
6[![Code coverage][codecov-badge]][codecov-link]
7[![BSD-2-Clause][license-badge]][license-link]
8[![Documentation][doc-badge]][doc-link]
9[![JOSS paper][joss-badge]][joss-link]
10[![Binder][binder-badge]][binder-link]
11
12## Summary and Purpose
13
14libCEED provides fast algebra for element-based discretizations, designed for
15performance portability, run-time flexibility, and clean embedding in higher
16level libraries and applications. It offers a C99 interface as well as bindings
17for Fortran, Python, Julia, and Rust.
18While our focus is on high-order finite elements, the approach is mostly
19algebraic and thus applicable to other discretizations in factored form, as
20explained in the [user manual](https://libceed.readthedocs.io/en/latest/) and
21API implementation portion of the
22[documentation](https://libceed.readthedocs.io/en/latest/api/).
23
24One of the challenges with high-order methods is that a global sparse matrix is
25no longer a good representation of a high-order linear operator, both with
26respect to the FLOPs needed for its evaluation, as well as the memory transfer
27needed for a matvec.  Thus, high-order methods require a new "format" that still
28represents a linear (or more generally non-linear) operator, but not through a
29sparse matrix.
30
31The goal of libCEED is to propose such a format, as well as supporting
32implementations and data structures, that enable efficient operator evaluation
33on a variety of computational device types (CPUs, GPUs, etc.). This new operator
34description is based on algebraically
35[factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi/#finite-element-operator-decomposition),
36which is easy to incorporate in a wide variety of applications, without significant
37refactoring of their own discretization infrastructure.
38
39The repository is part of the
40[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of
41software benchmarks, miniapps, libraries and APIs for efficient exascale
42discretizations based on high-order finite element and spectral element methods.
43See <http://github.com/ceed> for more information and source code availability.
44
45The CEED research is supported by the
46[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project)
47(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy
48organizations (Office of Science and the National Nuclear Security
49Administration) responsible for the planning and preparation of a
50[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including
51software, applications, hardware, advanced system engineering and early testbed
52platforms, in support of the nation’s exascale computing imperative.
53
54For more details on the CEED API see the [user manual](https://libceed.readthedocs.io/en/latest/).
55
56% gettingstarted-inclusion-marker
57
58## Building
59
60The CEED library, `libceed`, is a C99 library with no required dependencies, and
61with Fortran, Python, Julia, and Rust interfaces.  It can be built using:
62
63```
64make
65```
66
67or, with optimization flags:
68
69```
70make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
71```
72
73These optimization flags are used by all languages (C, C++, Fortran) and this
74makefile variable can also be set for testing and examples (below).
75
76The library attempts to automatically detect support for the AVX
77instruction set using gcc-style compiler options for the host.
78Support may need to be manually specified via:
79
80```
81make AVX=1
82```
83
84or:
85
86```
87make AVX=0
88```
89
90if your compiler does not support gcc-style options, if you are cross
91compiling, etc.
92
93To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory
94to your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or
95an appropriate directory. To store these or other arguments as defaults for
96future invocations of `make`, use:
97
98```
99make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2'
100```
101
102which stores these variables in `config.mk`.
103
104## Additional Language Interfaces
105
106The Fortran interface is built alongside the library automatically.
107
108Python users can install using:
109
110```
111pip install libceed
112```
113
114or in a clone of the repository via `pip install .`.
115
116Julia users can install using:
117
118```
119$ julia
120julia> ]
121pkg> add LibCEED
122```
123
124in the Julia package manager or in a clone of the repository via:
125
126```
127JULIA_LIBCEED_LIB=/path/to/libceed.so julia
128julia> # press ] to enter package manager
129(env) pkg> build LibCEED
130```
131
132Rust users can include libCEED via `Cargo.toml`:
133
134```toml
135[dependencies]
136libceed = { git = "https://github.com/CEED/libCEED", branch = "main" }
137```
138
139See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
140
141## Testing
142
143The test suite produces [TAP](https://testanything.org) output and is run by:
144
145```
146make test
147```
148
149or, using the `prove` tool distributed with Perl (recommended):
150
151```
152make prove
153```
154
155## Backends
156
157There are multiple supported backends, which can be selected at runtime in the examples:
158
159| CEED resource              | Backend                                           | Deterministic Capable |
160| :---                       | :---                                              | :---:                 |
161||
162| **CPU Native**             |
163| `/cpu/self/ref/serial`     | Serial reference implementation                   | Yes                   |
164| `/cpu/self/ref/blocked`    | Blocked reference implementation                  | Yes                   |
165| `/cpu/self/opt/serial`     | Serial optimized C implementation                 | Yes                   |
166| `/cpu/self/opt/blocked`    | Blocked optimized C implementation                | Yes                   |
167| `/cpu/self/avx/serial`     | Serial AVX implementation                         | Yes                   |
168| `/cpu/self/avx/blocked`    | Blocked AVX implementation                        | Yes                   |
169||
170| **CPU Valgrind**           |
171| `/cpu/self/memcheck/*`     | Memcheck backends, undefined value checks         | Yes                   |
172||
173| **CPU LIBXSMM**            |
174| `/cpu/self/xsmm/serial`    | Serial LIBXSMM implementation                     | Yes                   |
175| `/cpu/self/xsmm/blocked`   | Blocked LIBXSMM implementation                    | Yes                   |
176||
177| **CUDA Native**            |
178| `/gpu/cuda/ref`            | Reference pure CUDA kernels                       | Yes                   |
179| `/gpu/cuda/shared`         | Optimized pure CUDA kernels using shared memory   | Yes                   |
180| `/gpu/cuda/gen`            | Optimized pure CUDA kernels using code generation | No                    |
181||
182| **HIP Native**             |
183| `/gpu/hip/ref`             | Reference pure HIP kernels                        | Yes                   |
184| `/gpu/hip/shared`          | Optimized pure HIP kernels using shared memory    | Yes                   |
185| `/gpu/hip/gen`             | Optimized pure HIP kernels using code generation  | No                    |
186||
187| **MAGMA**                  |
188| `/gpu/cuda/magma`          | CUDA MAGMA kernels                                | No                    |
189| `/gpu/cuda/magma/det`      | CUDA MAGMA kernels                                | Yes                   |
190| `/gpu/hip/magma`           | HIP MAGMA kernels                                 | No                    |
191| `/gpu/hip/magma/det`       | HIP MAGMA kernels                                 | Yes                   |
192||
193| **OCCA**                   |
194| `/*/occa`                  | Selects backend based on available OCCA modes     | Yes                   |
195| `/cpu/self/occa`           | OCCA backend with serial CPU kernels              | Yes                   |
196| `/cpu/openmp/occa`         | OCCA backend with OpenMP kernels                  | Yes                   |
197| `/gpu/cuda/occa`           | OCCA backend with CUDA kernels                    | Yes                   |
198| `/gpu/hip/occa`~           | OCCA backend with HIP kernels                     | Yes                   |
199
200The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes
201with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process
202blocked batches of eight interlaced elements and are intended for meshes with higher numbers
203of elements.
204
205The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
206
207The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
208
209The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
210
211The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool
212to help verify that user QFunctions have no undefined values. To use, run your code with
213Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A
214'development' or 'debugging' version of Valgrind with headers is required to use this backend.
215This backend can be run in serial or blocked mode and defaults to running in the serial mode
216if `/cpu/self/memcheck` is selected at runtime.
217
218The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package
219to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but
220the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be
221forced by setting the environment variable `MKL=1`.
222
223The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
224
225The `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on
226the `/gpu/cuda/*` backends.  ROCm version 3.6 or newer is required.
227
228The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
229To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level
230MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
231By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends
232with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent
233directory, or set `MAGMA_DIR` to the proper location.  MAGMA version 2.5.0 or newer is required.
234Currently, each MAGMA library installation is only built for either CUDA or HIP.  The corresponding
235set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built
236for the version of the MAGMA library found in `MAGMA_DIR`.
237
238Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#`
239after the resource name.  For example:
240
241> - `/gpu/cuda/gen:device_id=1`
242
243The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide
244cross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point
245to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default,
246`OCCA_DIR` is set to `../occa`).
247
248Additionally, users can pass specific OCCA device properties after setting the CEED resource.
249For example:
250
251> - `"/*/occa:mode='CUDA',device_id=0"`
252
253Bit-for-bit reproducibility is important in some applications.
254However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
255The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
256
257## Examples
258
259libCEED comes with several examples of its usage, ranging from standalone C
260codes in the `/examples/ceed` directory to examples based on external packages,
261such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required.
262
263To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and
264`NEK5K_DIR` variables and run:
265
266```
267cd examples/
268```
269
270% running-examples-inclusion-marker
271
272```console
273# libCEED examples on CPU and GPU
274cd ceed/
275make
276./ex1-volume -ceed /cpu/self
277./ex1-volume -ceed /gpu/cuda
278./ex2-surface -ceed /cpu/self
279./ex2-surface -ceed /gpu/cuda
280cd ..
281
282# MFEM+libCEED examples on CPU and GPU
283cd mfem/
284make
285./bp1 -ceed /cpu/self -no-vis
286./bp3 -ceed /gpu/cuda -no-vis
287cd ..
288
289# Nek5000+libCEED examples on CPU and GPU
290cd nek/
291make
292./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
293./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
294cd ..
295
296# PETSc+libCEED examples on CPU and GPU
297cd petsc/
298make
299./bps -problem bp1 -ceed /cpu/self
300./bps -problem bp2 -ceed /gpu/cuda
301./bps -problem bp3 -ceed /cpu/self
302./bps -problem bp4 -ceed /gpu/cuda
303./bps -problem bp5 -ceed /cpu/self
304./bps -problem bp6 -ceed /gpu/cuda
305cd ..
306
307cd petsc/
308make
309./bpsraw -problem bp1 -ceed /cpu/self
310./bpsraw -problem bp2 -ceed /gpu/cuda
311./bpsraw -problem bp3 -ceed /cpu/self
312./bpsraw -problem bp4 -ceed /gpu/cuda
313./bpsraw -problem bp5 -ceed /cpu/self
314./bpsraw -problem bp6 -ceed /gpu/cuda
315cd ..
316
317cd petsc/
318make
319./bpssphere -problem bp1 -ceed /cpu/self
320./bpssphere -problem bp2 -ceed /gpu/cuda
321./bpssphere -problem bp3 -ceed /cpu/self
322./bpssphere -problem bp4 -ceed /gpu/cuda
323./bpssphere -problem bp5 -ceed /cpu/self
324./bpssphere -problem bp6 -ceed /gpu/cuda
325cd ..
326
327cd petsc/
328make
329./area -problem cube -ceed /cpu/self -degree 3
330./area -problem cube -ceed /gpu/cuda -degree 3
331./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
332./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
333
334cd fluids/
335make
336./navierstokes -ceed /cpu/self -degree 1
337./navierstokes -ceed /gpu/cuda -degree 1
338cd ..
339
340cd solids/
341make
342./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
343./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
344cd ..
345```
346
347For the last example shown, sample meshes to be used in place of
348`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
349
350The above code assumes a GPU-capable machine with the OCCA backend
351enabled. Depending on the available backends, other CEED resource
352specifiers can be provided with the `-ceed` option. Other command line
353arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
354
355% benchmarks-marker
356
357## Benchmarks
358
359A sequence of benchmarks for all enabled backends can be run using:
360
361```
362make benchmarks
363```
364
365The results from the benchmarks are stored inside the `benchmarks/` directory
366and they can be viewed using the commands (requires python with matplotlib):
367
368```
369cd benchmarks
370python postprocess-plot.py petsc-bps-bp1-*-output.txt
371python postprocess-plot.py petsc-bps-bp3-*-output.txt
372```
373
374Using the `benchmarks` target runs a comprehensive set of benchmarks which may
375take some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
376
377For more details about the benchmarks, see the `benchmarks/README.md` file.
378
379## Install
380
381To install libCEED, run:
382
383```
384make install prefix=/usr/local
385```
386
387or (e.g., if creating packages):
388
389```
390make install prefix=/usr DESTDIR=/packaging/path
391```
392
393The usual variables like `CC` and `CFLAGS` are used, and optimization flags
394for all languages can be set using the likes of `OPT='-O3 -march=native'`. Use
395`STATIC=1` to build static libraries (`libceed.a`).
396
397To install libCEED for Python, run:
398
399```
400pip install libceed
401```
402
403with the desired setuptools options, such as `--user`.
404
405### pkg-config
406
407In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config)
408file that can be used to easily compile and link.
409[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if
410`$prefix` is a standard location or you set the environment variable
411`PKG_CONFIG_PATH`:
412
413```
414cc `pkg-config --cflags --libs ceed` -o myapp myapp.c
415```
416
417will build `myapp` with libCEED.  This can be used with the source or
418installed directories.  Most build systems have support for pkg-config.
419
420## Contact
421
422You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov)
423or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
424
425## How to Cite
426
427If you utilize libCEED please cite:
428
429```
430@article{libceed-joss-paper,
431  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
432  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
433  journal      = {Journal of Open Source Software},
434  year         = {2021},
435  publisher    = {The Open Journal},
436  volume       = {6},
437  number       = {63},
438  pages        = {2945},
439  doi          = {10.21105/joss.02945}
440}
441
442@misc{libceed-user-manual,
443  author       = {Abdelfattah, Ahmad and
444                  Barra, Valeria and
445                  Beams, Natalie and
446                  Brown, Jed and
447                  Camier, Jean-Sylvain and
448                  Dobrev, Veselin and
449                  Dudouit, Yohann and
450                  Ghaffari, Leila and
451                  Kolev, Tzanio and
452                  Medina, David and
453                  Pazner, Will and
454                  Ratnayaka, Thilina and
455                  Thompson, Jeremy L and
456                  Tomov, Stanimire},
457  title        = {{libCEED} User Manual},
458  month        = jul,
459  year         = 2021,
460  publisher    = {Zenodo},
461  version      = {0.9.0},
462  doi          = {10.5281/zenodo.5077489}
463}
464```
465
466For libCEED's Python interface please cite:
467
468```
469@InProceedings{libceed-paper-proc-scipy-2020,
470  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
471  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
472  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
473  pages     = {85 - 90},
474  year      = {2020},
475  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
476  doi       = {10.25080/Majora-342d178e-00c}
477}
478```
479
480The BiBTeX entries for these references can be found in the
481`doc/bib/references.bib` file.
482
483## Copyright
484
485The following copyright applies to each file in the CEED software suite, unless
486otherwise stated in the file:
487
488> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the
489> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
490
491See files LICENSE and NOTICE for details.
492
493[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
494[github-link]: https://github.com/CEED/libCEED/actions
495[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
496[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
497[azure-badge]: https://dev.azure.com/CEED-ECP/libCEED/_apis/build/status/CEED.libCEED?branchName=main
498[azure-link]: https://dev.azure.com/CEED-ECP/libCEED/_build?definitionId=2
499[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
500[codecov-link]: https://codecov.io/gh/CEED/libCEED/
501[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
502[license-link]: https://opensource.org/licenses/BSD-2-Clause
503[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest
504[doc-link]: https://libceed.readthedocs.io/en/latest/?badge=latest
505[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
506[joss-link]: https://doi.org/10.21105/joss.02945
507[binder-badge]: http://mybinder.org/badge_logo.svg
508[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/tutorial-0-ceed.ipynb
509