xref: /libCEED/README.md (revision 50c301a53d2cec48a2aa861bf6f38393f4831c2f)
1# libCEED: Efficient Extensible Discretization
2
3[![GitHub Actions][github-badge]][github-link]
4[![GitLab-CI][gitlab-badge]][gitlab-link]
5[![Azure Pipelines][azure-badge]][azure-link]
6[![Code coverage][codecov-badge]][codecov-link]
7[![BSD-2-Clause][license-badge]][license-link]
8[![Documentation][doc-badge]][doc-link]
9[![JOSS paper][joss-badge]][joss-link]
10[![Binder][binder-badge]][binder-link]
11
12## Summary and Purpose
13
14libCEED provides fast algebra for element-based discretizations, designed for
15performance portability, run-time flexibility, and clean embedding in higher
16level libraries and applications. It offers a C99 interface as well as bindings
17for Fortran, Python, Julia, and Rust.
18While our focus is on high-order finite elements, the approach is mostly
19algebraic and thus applicable to other discretizations in factored form, as
20explained in the [user manual](https://libceed.readthedocs.io/en/latest/) and
21API implementation portion of the
22[documentation](https://libceed.readthedocs.io/en/latest/api/).
23
24One of the challenges with high-order methods is that a global sparse matrix is
25no longer a good representation of a high-order linear operator, both with
26respect to the FLOPs needed for its evaluation, as well as the memory transfer
27needed for a matvec.  Thus, high-order methods require a new "format" that still
28represents a linear (or more generally non-linear) operator, but not through a
29sparse matrix.
30
31The goal of libCEED is to propose such a format, as well as supporting
32implementations and data structures, that enable efficient operator evaluation
33on a variety of computational device types (CPUs, GPUs, etc.). This new operator
34description is based on algebraically
35[factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi/#finite-element-operator-decomposition),
36which is easy to incorporate in a wide variety of applications, without significant
37refactoring of their own discretization infrastructure.
38
39The repository is part of the
40[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of
41software benchmarks, miniapps, libraries and APIs for efficient exascale
42discretizations based on high-order finite element and spectral element methods.
43See <http://github.com/ceed> for more information and source code availability.
44
45The CEED research is supported by the
46[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project)
47(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy
48organizations (Office of Science and the National Nuclear Security
49Administration) responsible for the planning and preparation of a
50[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including
51software, applications, hardware, advanced system engineering and early testbed
52platforms, in support of the nation’s exascale computing imperative.
53
54For more details on the CEED API see the [user manual](https://libceed.readthedocs.io/en/latest/).
55
56% gettingstarted-inclusion-marker
57
58## Building
59
60The CEED library, `libceed`, is a C99 library with no required dependencies, and
61with Fortran, Python, Julia, and Rust interfaces.  It can be built using:
62
63```
64make
65```
66
67or, with optimization flags:
68
69```
70make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
71```
72
73These optimization flags are used by all languages (C, C++, Fortran) and this
74makefile variable can also be set for testing and examples (below).
75
76The library attempts to automatically detect support for the AVX
77instruction set using gcc-style compiler options for the host.
78Support may need to be manually specified via:
79
80```
81make AVX=1
82```
83
84or:
85
86```
87make AVX=0
88```
89
90if your compiler does not support gcc-style options, if you are cross
91compiling, etc.
92
93To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory
94to your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or
95an appropriate directory. To store these or other arguments as defaults for
96future invocations of `make`, use:
97
98```
99make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2'
100```
101
102which stores these variables in `config.mk`.
103
104## Additional Language Interfaces
105
106The Fortran interface is built alongside the library automatically.
107
108Python users can install using:
109
110```
111pip install libceed
112```
113
114or in a clone of the repository via `pip install .`.
115
116Julia users can install using:
117
118```
119$ julia
120julia> ]
121pkg> add LibCEED
122```
123
124See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/)
125for more information.
126
127Rust users can include libCEED via `Cargo.toml`:
128
129```toml
130[dependencies]
131libceed = { git = "https://github.com/CEED/libCEED", branch = "main" }
132```
133
134See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
135
136## Testing
137
138The test suite produces [TAP](https://testanything.org) output and is run by:
139
140```
141make test
142```
143
144or, using the `prove` tool distributed with Perl (recommended):
145
146```
147make prove
148```
149
150## Backends
151
152There are multiple supported backends, which can be selected at runtime in the examples:
153
154| CEED resource              | Backend                                           | Deterministic Capable |
155| :---                       | :---                                              | :---:                 |
156||
157| **CPU Native**             |
158| `/cpu/self/ref/serial`     | Serial reference implementation                   | Yes                   |
159| `/cpu/self/ref/blocked`    | Blocked reference implementation                  | Yes                   |
160| `/cpu/self/opt/serial`     | Serial optimized C implementation                 | Yes                   |
161| `/cpu/self/opt/blocked`    | Blocked optimized C implementation                | Yes                   |
162| `/cpu/self/avx/serial`     | Serial AVX implementation                         | Yes                   |
163| `/cpu/self/avx/blocked`    | Blocked AVX implementation                        | Yes                   |
164||
165| **CPU Valgrind**           |
166| `/cpu/self/memcheck/*`     | Memcheck backends, undefined value checks         | Yes                   |
167||
168| **CPU LIBXSMM**            |
169| `/cpu/self/xsmm/serial`    | Serial LIBXSMM implementation                     | Yes                   |
170| `/cpu/self/xsmm/blocked`   | Blocked LIBXSMM implementation                    | Yes                   |
171||
172| **CUDA Native**            |
173| `/gpu/cuda/ref`            | Reference pure CUDA kernels                       | Yes                   |
174| `/gpu/cuda/shared`         | Optimized pure CUDA kernels using shared memory   | Yes                   |
175| `/gpu/cuda/gen`            | Optimized pure CUDA kernels using code generation | No                    |
176||
177| **HIP Native**             |
178| `/gpu/hip/ref`             | Reference pure HIP kernels                        | Yes                   |
179| `/gpu/hip/shared`          | Optimized pure HIP kernels using shared memory    | Yes                   |
180| `/gpu/hip/gen`             | Optimized pure HIP kernels using code generation  | No                    |
181||
182| **MAGMA**                  |
183| `/gpu/cuda/magma`          | CUDA MAGMA kernels                                | No                    |
184| `/gpu/cuda/magma/det`      | CUDA MAGMA kernels                                | Yes                   |
185| `/gpu/hip/magma`           | HIP MAGMA kernels                                 | No                    |
186| `/gpu/hip/magma/det`       | HIP MAGMA kernels                                 | Yes                   |
187||
188| **OCCA**                   |
189| `/*/occa`                  | Selects backend based on available OCCA modes     | Yes                   |
190| `/cpu/self/occa`           | OCCA backend with serial CPU kernels              | Yes                   |
191| `/cpu/openmp/occa`         | OCCA backend with OpenMP kernels                  | Yes                   |
192| `/gpu/cuda/occa`           | OCCA backend with CUDA kernels                    | Yes                   |
193| `/gpu/hip/occa`~           | OCCA backend with HIP kernels                     | Yes                   |
194
195The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes
196with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process
197blocked batches of eight interlaced elements and are intended for meshes with higher numbers
198of elements.
199
200The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
201
202The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
203
204The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
205
206The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool
207to help verify that user QFunctions have no undefined values. To use, run your code with
208Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A
209'development' or 'debugging' version of Valgrind with headers is required to use this backend.
210This backend can be run in serial or blocked mode and defaults to running in the serial mode
211if `/cpu/self/memcheck` is selected at runtime.
212
213The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package
214to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but
215the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be
216forced by setting the environment variable `MKL=1`.
217
218The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
219
220The `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on
221the `/gpu/cuda/*` backends.  ROCm version 3.6 or newer is required.
222
223The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
224To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level
225MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
226By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends
227with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent
228directory, or set `MAGMA_DIR` to the proper location.  MAGMA version 2.5.0 or newer is required.
229Currently, each MAGMA library installation is only built for either CUDA or HIP.  The corresponding
230set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built
231for the version of the MAGMA library found in `MAGMA_DIR`.
232
233Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#`
234after the resource name.  For example:
235
236> - `/gpu/cuda/gen:device_id=1`
237
238The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide
239cross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point
240to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default,
241`OCCA_DIR` is set to `../occa`).
242
243Additionally, users can pass specific OCCA device properties after setting the CEED resource.
244For example:
245
246> - `"/*/occa:mode='CUDA',device_id=0"`
247
248Bit-for-bit reproducibility is important in some applications.
249However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
250The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
251
252## Examples
253
254libCEED comes with several examples of its usage, ranging from standalone C
255codes in the `/examples/ceed` directory to examples based on external packages,
256such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required.
257
258To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and
259`NEK5K_DIR` variables and run:
260
261```
262cd examples/
263```
264
265% running-examples-inclusion-marker
266
267```console
268# libCEED examples on CPU and GPU
269cd ceed/
270make
271./ex1-volume -ceed /cpu/self
272./ex1-volume -ceed /gpu/cuda
273./ex2-surface -ceed /cpu/self
274./ex2-surface -ceed /gpu/cuda
275cd ..
276
277# MFEM+libCEED examples on CPU and GPU
278cd mfem/
279make
280./bp1 -ceed /cpu/self -no-vis
281./bp3 -ceed /gpu/cuda -no-vis
282cd ..
283
284# Nek5000+libCEED examples on CPU and GPU
285cd nek/
286make
287./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
288./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
289cd ..
290
291# PETSc+libCEED examples on CPU and GPU
292cd petsc/
293make
294./bps -problem bp1 -ceed /cpu/self
295./bps -problem bp2 -ceed /gpu/cuda
296./bps -problem bp3 -ceed /cpu/self
297./bps -problem bp4 -ceed /gpu/cuda
298./bps -problem bp5 -ceed /cpu/self
299./bps -problem bp6 -ceed /gpu/cuda
300cd ..
301
302cd petsc/
303make
304./bpsraw -problem bp1 -ceed /cpu/self
305./bpsraw -problem bp2 -ceed /gpu/cuda
306./bpsraw -problem bp3 -ceed /cpu/self
307./bpsraw -problem bp4 -ceed /gpu/cuda
308./bpsraw -problem bp5 -ceed /cpu/self
309./bpsraw -problem bp6 -ceed /gpu/cuda
310cd ..
311
312cd petsc/
313make
314./bpssphere -problem bp1 -ceed /cpu/self
315./bpssphere -problem bp2 -ceed /gpu/cuda
316./bpssphere -problem bp3 -ceed /cpu/self
317./bpssphere -problem bp4 -ceed /gpu/cuda
318./bpssphere -problem bp5 -ceed /cpu/self
319./bpssphere -problem bp6 -ceed /gpu/cuda
320cd ..
321
322cd petsc/
323make
324./area -problem cube -ceed /cpu/self -degree 3
325./area -problem cube -ceed /gpu/cuda -degree 3
326./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
327./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
328
329cd fluids/
330make
331./navierstokes -ceed /cpu/self -degree 1
332./navierstokes -ceed /gpu/cuda -degree 1
333cd ..
334
335cd solids/
336make
337./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
338./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
339cd ..
340```
341
342For the last example shown, sample meshes to be used in place of
343`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
344
345The above code assumes a GPU-capable machine with the OCCA backend
346enabled. Depending on the available backends, other CEED resource
347specifiers can be provided with the `-ceed` option. Other command line
348arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
349
350% benchmarks-marker
351
352## Benchmarks
353
354A sequence of benchmarks for all enabled backends can be run using:
355
356```
357make benchmarks
358```
359
360The results from the benchmarks are stored inside the `benchmarks/` directory
361and they can be viewed using the commands (requires python with matplotlib):
362
363```
364cd benchmarks
365python postprocess-plot.py petsc-bps-bp1-*-output.txt
366python postprocess-plot.py petsc-bps-bp3-*-output.txt
367```
368
369Using the `benchmarks` target runs a comprehensive set of benchmarks which may
370take some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
371
372For more details about the benchmarks, see the `benchmarks/README.md` file.
373
374## Install
375
376To install libCEED, run:
377
378```
379make install prefix=/usr/local
380```
381
382or (e.g., if creating packages):
383
384```
385make install prefix=/usr DESTDIR=/packaging/path
386```
387
388The usual variables like `CC` and `CFLAGS` are used, and optimization flags
389for all languages can be set using the likes of `OPT='-O3 -march=native'`. Use
390`STATIC=1` to build static libraries (`libceed.a`).
391
392To install libCEED for Python, run:
393
394```
395pip install libceed
396```
397
398with the desired setuptools options, such as `--user`.
399
400### pkg-config
401
402In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config)
403file that can be used to easily compile and link.
404[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if
405`$prefix` is a standard location or you set the environment variable
406`PKG_CONFIG_PATH`:
407
408```
409cc `pkg-config --cflags --libs ceed` -o myapp myapp.c
410```
411
412will build `myapp` with libCEED.  This can be used with the source or
413installed directories.  Most build systems have support for pkg-config.
414
415## Contact
416
417You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov)
418or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
419
420## How to Cite
421
422If you utilize libCEED please cite:
423
424```
425@article{libceed-joss-paper,
426  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
427  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
428  journal      = {Journal of Open Source Software},
429  year         = {2021},
430  publisher    = {The Open Journal},
431  volume       = {6},
432  number       = {63},
433  pages        = {2945},
434  doi          = {10.21105/joss.02945}
435}
436
437@misc{libceed-user-manual,
438  author       = {Abdelfattah, Ahmad and
439                  Barra, Valeria and
440                  Beams, Natalie and
441                  Brown, Jed and
442                  Camier, Jean-Sylvain and
443                  Dobrev, Veselin and
444                  Dudouit, Yohann and
445                  Ghaffari, Leila and
446                  Kolev, Tzanio and
447                  Medina, David and
448                  Pazner, Will and
449                  Ratnayaka, Thilina and
450                  Thompson, Jeremy L and
451                  Tomov, Stanimire},
452  title        = {{libCEED} User Manual},
453  month        = jul,
454  year         = 2021,
455  publisher    = {Zenodo},
456  version      = {0.9.0},
457  doi          = {10.5281/zenodo.5077489}
458}
459```
460
461For libCEED's Python interface please cite:
462
463```
464@InProceedings{libceed-paper-proc-scipy-2020,
465  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
466  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
467  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
468  pages     = {85 - 90},
469  year      = {2020},
470  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
471  doi       = {10.25080/Majora-342d178e-00c}
472}
473```
474
475The BiBTeX entries for these references can be found in the
476`doc/bib/references.bib` file.
477
478## Copyright
479
480The following copyright applies to each file in the CEED software suite, unless
481otherwise stated in the file:
482
483> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the
484> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
485
486See files LICENSE and NOTICE for details.
487
488[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
489[github-link]: https://github.com/CEED/libCEED/actions
490[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
491[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
492[azure-badge]: https://dev.azure.com/CEED-ECP/libCEED/_apis/build/status/CEED.libCEED?branchName=main
493[azure-link]: https://dev.azure.com/CEED-ECP/libCEED/_build?definitionId=2
494[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
495[codecov-link]: https://codecov.io/gh/CEED/libCEED/
496[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
497[license-link]: https://opensource.org/licenses/BSD-2-Clause
498[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest
499[doc-link]: https://libceed.readthedocs.io/en/latest/?badge=latest
500[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
501[joss-link]: https://doi.org/10.21105/joss.02945
502[binder-badge]: http://mybinder.org/badge_logo.svg
503[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb
504