xref: /libCEED/README.md (revision bcb2dfae4c301ddfdddf58806f08f6e7d17f4ea5)
1*bcb2dfaeSJed Brown# libCEED: Efficient Extensible Discretization
2*bcb2dfaeSJed Brown
3*bcb2dfaeSJed Brown```{image} https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg
4*bcb2dfaeSJed Brown:alt: GitHub Actions
5*bcb2dfaeSJed Brown:target: https://github.com/CEED/libCEED/actions
6*bcb2dfaeSJed Brown```
7*bcb2dfaeSJed Brown
8*bcb2dfaeSJed Brown```{image} https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI
9*bcb2dfaeSJed Brown:alt: GitLab-CI
10*bcb2dfaeSJed Brown:target: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main
11*bcb2dfaeSJed Brown```
12*bcb2dfaeSJed Brown
13*bcb2dfaeSJed Brown```{image} https://dev.azure.com/CEED-ECP/libCEED/_apis/build/status/CEED.libCEED?branchName=main
14*bcb2dfaeSJed Brown:alt: Azure Pipelines
15*bcb2dfaeSJed Brown:target: https://dev.azure.com/CEED-ECP/libCEED/_build?definitionId=2
16*bcb2dfaeSJed Brown```
17*bcb2dfaeSJed Brown
18*bcb2dfaeSJed Brown```{image} https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg
19*bcb2dfaeSJed Brown:alt: Code Coverage
20*bcb2dfaeSJed Brown:target: https://codecov.io/gh/CEED/libCEED/
21*bcb2dfaeSJed Brown```
22*bcb2dfaeSJed Brown
23*bcb2dfaeSJed Brown```{image} https://img.shields.io/badge/License-BSD%202--Clause-orange.svg
24*bcb2dfaeSJed Brown:alt: License
25*bcb2dfaeSJed Brown:target: https://opensource.org/licenses/BSD-2-Clause
26*bcb2dfaeSJed Brown```
27*bcb2dfaeSJed Brown
28*bcb2dfaeSJed Brown```{image} https://readthedocs.org/projects/libceed/badge/?version=latest
29*bcb2dfaeSJed Brown:alt: Read the Docs
30*bcb2dfaeSJed Brown:target: https://libceed.readthedocs.io/en/latest/?badge=latest
31*bcb2dfaeSJed Brown```
32*bcb2dfaeSJed Brown
33*bcb2dfaeSJed Brown```{image} https://joss.theoj.org/papers/10.21105/joss.02945/status.svg
34*bcb2dfaeSJed Brown:alt: JOSS
35*bcb2dfaeSJed Brown:target: https://doi.org/10.21105/joss.02945
36*bcb2dfaeSJed Brown```
37*bcb2dfaeSJed Brown
38*bcb2dfaeSJed Brown```{image} http://mybinder.org/badge_logo.svg
39*bcb2dfaeSJed Brown:alt: Binder
40*bcb2dfaeSJed Brown:target: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/tutorial-0-ceed.ipynb
41*bcb2dfaeSJed Brown```
42*bcb2dfaeSJed Brown
43*bcb2dfaeSJed Brown## Summary and Purpose
44*bcb2dfaeSJed Brown
45*bcb2dfaeSJed BrownlibCEED provides fast algebra for element-based discretizations, designed for
46*bcb2dfaeSJed Brownperformance portability, run-time flexibility, and clean embedding in higher
47*bcb2dfaeSJed Brownlevel libraries and applications. It offers a C99 interface as well as bindings
48*bcb2dfaeSJed Brownfor Fortran, Python, Julia, and Rust.
49*bcb2dfaeSJed BrownWhile our focus is on high-order finite elements, the approach is mostly
50*bcb2dfaeSJed Brownalgebraic and thus applicable to other discretizations in factored form, as
51*bcb2dfaeSJed Brownexplained in the [user manual](https://libceed.readthedocs.io/en/latest/) and
52*bcb2dfaeSJed BrownAPI implementation portion of the
53*bcb2dfaeSJed Brown[documentation](https://libceed.readthedocs.io/en/latest/api/).
54*bcb2dfaeSJed Brown
55*bcb2dfaeSJed BrownOne of the challenges with high-order methods is that a global sparse matrix is
56*bcb2dfaeSJed Brownno longer a good representation of a high-order linear operator, both with
57*bcb2dfaeSJed Brownrespect to the FLOPs needed for its evaluation, as well as the memory transfer
58*bcb2dfaeSJed Brownneeded for a matvec.  Thus, high-order methods require a new "format" that still
59*bcb2dfaeSJed Brownrepresents a linear (or more generally non-linear) operator, but not through a
60*bcb2dfaeSJed Brownsparse matrix.
61*bcb2dfaeSJed Brown
62*bcb2dfaeSJed BrownThe goal of libCEED is to propose such a format, as well as supporting
63*bcb2dfaeSJed Brownimplementations and data structures, that enable efficient operator evaluation
64*bcb2dfaeSJed Brownon a variety of computational device types (CPUs, GPUs, etc.). This new operator
65*bcb2dfaeSJed Browndescription is based on algebraically
66*bcb2dfaeSJed Brown[factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi/#finite-element-operator-decomposition),
67*bcb2dfaeSJed Brownwhich is easy to incorporate in a wide variety of applications, without significant
68*bcb2dfaeSJed Brownrefactoring of their own discretization infrastructure.
69*bcb2dfaeSJed Brown
70*bcb2dfaeSJed BrownThe repository is part of the
71*bcb2dfaeSJed Brown[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of
72*bcb2dfaeSJed Brownsoftware benchmarks, miniapps, libraries and APIs for efficient exascale
73*bcb2dfaeSJed Browndiscretizations based on high-order finite element and spectral element methods.
74*bcb2dfaeSJed BrownSee <http://github.com/ceed> for more information and source code availability.
75*bcb2dfaeSJed Brown
76*bcb2dfaeSJed BrownThe CEED research is supported by the
77*bcb2dfaeSJed Brown[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project)
78*bcb2dfaeSJed Brown(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy
79*bcb2dfaeSJed Brownorganizations (Office of Science and the National Nuclear Security
80*bcb2dfaeSJed BrownAdministration) responsible for the planning and preparation of a
81*bcb2dfaeSJed Brown[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including
82*bcb2dfaeSJed Brownsoftware, applications, hardware, advanced system engineering and early testbed
83*bcb2dfaeSJed Brownplatforms, in support of the nation’s exascale computing imperative.
84*bcb2dfaeSJed Brown
85*bcb2dfaeSJed BrownFor more details on the CEED API see the [user manual](https://libceed.readthedocs.io/en/latest/).
86*bcb2dfaeSJed Brown
87*bcb2dfaeSJed Brown% gettingstarted-inclusion-marker
88*bcb2dfaeSJed Brown
89*bcb2dfaeSJed Brown## Building
90*bcb2dfaeSJed Brown
91*bcb2dfaeSJed BrownThe CEED library, `libceed`, is a C99 library with no required dependencies, and
92*bcb2dfaeSJed Brownwith Fortran, Python, Julia, and Rust interfaces.  It can be built using:
93*bcb2dfaeSJed Brown
94*bcb2dfaeSJed Brown```
95*bcb2dfaeSJed Brownmake
96*bcb2dfaeSJed Brown```
97*bcb2dfaeSJed Brown
98*bcb2dfaeSJed Brownor, with optimization flags:
99*bcb2dfaeSJed Brown
100*bcb2dfaeSJed Brown```
101*bcb2dfaeSJed Brownmake OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
102*bcb2dfaeSJed Brown```
103*bcb2dfaeSJed Brown
104*bcb2dfaeSJed BrownThese optimization flags are used by all languages (C, C++, Fortran) and this
105*bcb2dfaeSJed Brownmakefile variable can also be set for testing and examples (below).
106*bcb2dfaeSJed Brown
107*bcb2dfaeSJed BrownThe library attempts to automatically detect support for the AVX
108*bcb2dfaeSJed Browninstruction set using gcc-style compiler options for the host.
109*bcb2dfaeSJed BrownSupport may need to be manually specified via:
110*bcb2dfaeSJed Brown
111*bcb2dfaeSJed Brown```
112*bcb2dfaeSJed Brownmake AVX=1
113*bcb2dfaeSJed Brown```
114*bcb2dfaeSJed Brown
115*bcb2dfaeSJed Brownor:
116*bcb2dfaeSJed Brown
117*bcb2dfaeSJed Brown```
118*bcb2dfaeSJed Brownmake AVX=0
119*bcb2dfaeSJed Brown```
120*bcb2dfaeSJed Brown
121*bcb2dfaeSJed Brownif your compiler does not support gcc-style options, if you are cross
122*bcb2dfaeSJed Browncompiling, etc.
123*bcb2dfaeSJed Brown
124*bcb2dfaeSJed BrownTo enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory
125*bcb2dfaeSJed Brownto your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or
126*bcb2dfaeSJed Brownan appropriate directory. To store these or other arguments as defaults for
127*bcb2dfaeSJed Brownfuture invocations of `make`, use:
128*bcb2dfaeSJed Brown
129*bcb2dfaeSJed Brown```
130*bcb2dfaeSJed Brownmake configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2'
131*bcb2dfaeSJed Brown```
132*bcb2dfaeSJed Brown
133*bcb2dfaeSJed Brownwhich stores these variables in `config.mk`.
134*bcb2dfaeSJed Brown
135*bcb2dfaeSJed Brown## Additional Language Interfaces
136*bcb2dfaeSJed Brown
137*bcb2dfaeSJed BrownThe Fortran interface is built alongside the library automatically.
138*bcb2dfaeSJed Brown
139*bcb2dfaeSJed BrownPython users can install using:
140*bcb2dfaeSJed Brown
141*bcb2dfaeSJed Brown```
142*bcb2dfaeSJed Brownpip install libceed
143*bcb2dfaeSJed Brown```
144*bcb2dfaeSJed Brown
145*bcb2dfaeSJed Brownor in a clone of the repository via `pip install .`.
146*bcb2dfaeSJed Brown
147*bcb2dfaeSJed BrownJulia users can install using:
148*bcb2dfaeSJed Brown
149*bcb2dfaeSJed Brown```
150*bcb2dfaeSJed Brown$ julia
151*bcb2dfaeSJed Brownjulia> ]
152*bcb2dfaeSJed Brownpkg> add LibCEED
153*bcb2dfaeSJed Brown```
154*bcb2dfaeSJed Brown
155*bcb2dfaeSJed Brownin the Julia package manager or in a clone of the repository via:
156*bcb2dfaeSJed Brown
157*bcb2dfaeSJed Brown```
158*bcb2dfaeSJed BrownJULIA_LIBCEED_LIB=/path/to/libceed.so julia
159*bcb2dfaeSJed Brownjulia> # press ] to enter package manager
160*bcb2dfaeSJed Brown(env) pkg> build LibCEED
161*bcb2dfaeSJed Brown```
162*bcb2dfaeSJed Brown
163*bcb2dfaeSJed BrownRust users can include libCEED via `Cargo.toml`:
164*bcb2dfaeSJed Brown
165*bcb2dfaeSJed Brown```toml
166*bcb2dfaeSJed Brown[dependencies]
167*bcb2dfaeSJed Brownlibceed = { git = "https://github.com/CEED/libCEED", branch = "main" }
168*bcb2dfaeSJed Brown```
169*bcb2dfaeSJed Brown
170*bcb2dfaeSJed BrownSee the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details.
171*bcb2dfaeSJed Brown
172*bcb2dfaeSJed Brown## Testing
173*bcb2dfaeSJed Brown
174*bcb2dfaeSJed BrownThe test suite produces [TAP](https://testanything.org) output and is run by:
175*bcb2dfaeSJed Brown
176*bcb2dfaeSJed Brown```
177*bcb2dfaeSJed Brownmake test
178*bcb2dfaeSJed Brown```
179*bcb2dfaeSJed Brown
180*bcb2dfaeSJed Brownor, using the `prove` tool distributed with Perl (recommended):
181*bcb2dfaeSJed Brown
182*bcb2dfaeSJed Brown```
183*bcb2dfaeSJed Brownmake prove
184*bcb2dfaeSJed Brown```
185*bcb2dfaeSJed Brown
186*bcb2dfaeSJed Brown## Backends
187*bcb2dfaeSJed Brown
188*bcb2dfaeSJed BrownThere are multiple supported backends, which can be selected at runtime in the examples:
189*bcb2dfaeSJed Brown
190*bcb2dfaeSJed Brown```{eval-rst}
191*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
192*bcb2dfaeSJed Brown| CEED resource              | Backend                                           | Deterministic Capable |
193*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
194*bcb2dfaeSJed Brown| CPU Native Backends                                                                                    |
195*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
196*bcb2dfaeSJed Brown| ``/cpu/self/ref/serial``   | Serial reference implementation                   | Yes                   |
197*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
198*bcb2dfaeSJed Brown| ``/cpu/self/ref/blocked``  | Blocked reference implementation                  | Yes                   |
199*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
200*bcb2dfaeSJed Brown| ``/cpu/self/opt/serial``   | Serial optimized C implementation                 | Yes                   |
201*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
202*bcb2dfaeSJed Brown| ``/cpu/self/opt/blocked``  | Blocked optimized C implementation                | Yes                   |
203*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
204*bcb2dfaeSJed Brown| ``/cpu/self/avx/serial``   | Serial AVX implementation                         | Yes                   |
205*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
206*bcb2dfaeSJed Brown| ``/cpu/self/avx/blocked``  | Blocked AVX implementation                        | Yes                   |
207*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
208*bcb2dfaeSJed Brown| CPU Valgrind Backends                                                                                  |
209*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
210*bcb2dfaeSJed Brown| ``/cpu/self/memcheck/*``   | Memcheck backends, undefined value checks         | Yes                   |
211*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
212*bcb2dfaeSJed Brown| CPU LIBXSMM Backends                                                                                   |
213*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
214*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/serial``  | Serial LIBXSMM implementation                     | Yes                   |
215*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
216*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/blocked`` | Blocked LIBXSMM implementation                    | Yes                   |
217*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
218*bcb2dfaeSJed Brown| CUDA Native Backends                                                                                   |
219*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
220*bcb2dfaeSJed Brown| ``/gpu/cuda/ref``          | Reference pure CUDA kernels                       | Yes                   |
221*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
222*bcb2dfaeSJed Brown| ``/gpu/cuda/shared``       | Optimized pure CUDA kernels using shared memory   | Yes                   |
223*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
224*bcb2dfaeSJed Brown| ``/gpu/cuda/gen``          | Optimized pure CUDA kernels using code generation | No                    |
225*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
226*bcb2dfaeSJed Brown| HIP Native Backends                                                                                    |
227*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
228*bcb2dfaeSJed Brown| ``/gpu/hip/ref``           | Reference pure HIP kernels                        | Yes                   |
229*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
230*bcb2dfaeSJed Brown| ``/gpu/hip/shared``        | Optimized pure HIP kernels using shared memory    | Yes                   |
231*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
232*bcb2dfaeSJed Brown| ``/gpu/hip/gen``           | Optimized pure HIP kernels using code generation  | No                    |
233*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
234*bcb2dfaeSJed Brown| MAGMA Backends                                                                                         |
235*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
236*bcb2dfaeSJed Brown| ``/gpu/cuda/magma``        | CUDA MAGMA kernels                                | No                    |
237*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
238*bcb2dfaeSJed Brown| ``/gpu/cuda/magma/det``    | CUDA MAGMA kernels                                | Yes                   |
239*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
240*bcb2dfaeSJed Brown| ``/gpu/hip/magma``         | HIP MAGMA kernels                                 | No                    |
241*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
242*bcb2dfaeSJed Brown| ``/gpu/hip/magma/det``     | HIP MAGMA kernels                                 | Yes                   |
243*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
244*bcb2dfaeSJed Brown| OCCA Backends                                                                                          |
245*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
246*bcb2dfaeSJed Brown| ``/*/occa``                | Selects backend based on available OCCA modes     | Yes                   |
247*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
248*bcb2dfaeSJed Brown| ``/cpu/self/occa``         | OCCA backend with serial CPU kernels              | Yes                   |
249*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
250*bcb2dfaeSJed Brown| ``/cpu/openmp/occa``       | OCCA backend with OpenMP kernels                  | Yes                   |
251*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
252*bcb2dfaeSJed Brown| ``/gpu/cuda/occa``         | OCCA backend with CUDA kernels                    | Yes                   |
253*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
254*bcb2dfaeSJed Brown| ``/gpu/hip/occa``          | OCCA backend with HIP kernels                     | Yes                   |
255*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+
256*bcb2dfaeSJed Brown```
257*bcb2dfaeSJed Brown
258*bcb2dfaeSJed BrownThe `/cpu/self/*/serial` backends process one element at a time and are intended for meshes
259*bcb2dfaeSJed Brownwith a smaller number of high order elements. The `/cpu/self/*/blocked` backends process
260*bcb2dfaeSJed Brownblocked batches of eight interlaced elements and are intended for meshes with higher numbers
261*bcb2dfaeSJed Brownof elements.
262*bcb2dfaeSJed Brown
263*bcb2dfaeSJed BrownThe `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
264*bcb2dfaeSJed Brown
265*bcb2dfaeSJed BrownThe `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
266*bcb2dfaeSJed Brown
267*bcb2dfaeSJed BrownThe `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
268*bcb2dfaeSJed Brown
269*bcb2dfaeSJed BrownThe `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool
270*bcb2dfaeSJed Brownto help verify that user QFunctions have no undefined values. To use, run your code with
271*bcb2dfaeSJed BrownValgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A
272*bcb2dfaeSJed Brown'development' or 'debugging' version of Valgrind with headers is required to use this backend.
273*bcb2dfaeSJed BrownThis backend can be run in serial or blocked mode and defaults to running in the serial mode
274*bcb2dfaeSJed Brownif `/cpu/self/memcheck` is selected at runtime.
275*bcb2dfaeSJed Brown
276*bcb2dfaeSJed BrownThe `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package
277*bcb2dfaeSJed Brownto provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but
278*bcb2dfaeSJed Brownthe Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be
279*bcb2dfaeSJed Brownforced by setting the environment variable `MKL=1`.
280*bcb2dfaeSJed Brown
281*bcb2dfaeSJed BrownThe `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
282*bcb2dfaeSJed Brown
283*bcb2dfaeSJed BrownThe `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on
284*bcb2dfaeSJed Brownthe `/gpu/cuda/*` backends.  ROCm version 3.6 or newer is required.
285*bcb2dfaeSJed Brown
286*bcb2dfaeSJed BrownThe `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package.
287*bcb2dfaeSJed BrownTo enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level
288*bcb2dfaeSJed BrownMAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`.
289*bcb2dfaeSJed BrownBy default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends
290*bcb2dfaeSJed Brownwith a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent
291*bcb2dfaeSJed Browndirectory, or set `MAGMA_DIR` to the proper location.  MAGMA version 2.5.0 or newer is required.
292*bcb2dfaeSJed BrownCurrently, each MAGMA library installation is only built for either CUDA or HIP.  The corresponding
293*bcb2dfaeSJed Brownset of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built
294*bcb2dfaeSJed Brownfor the version of the MAGMA library found in `MAGMA_DIR`.
295*bcb2dfaeSJed Brown
296*bcb2dfaeSJed BrownUsers can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#`
297*bcb2dfaeSJed Brownafter the resource name.  For example:
298*bcb2dfaeSJed Brown
299*bcb2dfaeSJed Brown> - `/gpu/cuda/gen:device_id=1`
300*bcb2dfaeSJed Brown
301*bcb2dfaeSJed BrownThe `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide
302*bcb2dfaeSJed Browncross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point
303*bcb2dfaeSJed Brownto the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default,
304*bcb2dfaeSJed Brown`OCCA_DIR` is set to `../occa`).
305*bcb2dfaeSJed Brown
306*bcb2dfaeSJed BrownAdditionally, users can pass specific OCCA device properties after setting the CEED resource.
307*bcb2dfaeSJed BrownFor example:
308*bcb2dfaeSJed Brown
309*bcb2dfaeSJed Brown> - `"/*/occa:mode='CUDA',device_id=0"`
310*bcb2dfaeSJed Brown
311*bcb2dfaeSJed BrownBit-for-bit reproducibility is important in some applications.
312*bcb2dfaeSJed BrownHowever, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
313*bcb2dfaeSJed BrownThe backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.
314*bcb2dfaeSJed Brown
315*bcb2dfaeSJed Brown## Examples
316*bcb2dfaeSJed Brown
317*bcb2dfaeSJed BrownlibCEED comes with several examples of its usage, ranging from standalone C
318*bcb2dfaeSJed Browncodes in the `/examples/ceed` directory to examples based on external packages,
319*bcb2dfaeSJed Brownsuch as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required.
320*bcb2dfaeSJed Brown
321*bcb2dfaeSJed BrownTo build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and
322*bcb2dfaeSJed Brown`NEK5K_DIR` variables and run:
323*bcb2dfaeSJed Brown
324*bcb2dfaeSJed Brown```
325*bcb2dfaeSJed Browncd examples/
326*bcb2dfaeSJed Brown```
327*bcb2dfaeSJed Brown
328*bcb2dfaeSJed Brown% running-examples-inclusion-marker
329*bcb2dfaeSJed Brown
330*bcb2dfaeSJed Brown```console
331*bcb2dfaeSJed Brown# libCEED examples on CPU and GPU
332*bcb2dfaeSJed Browncd ceed/
333*bcb2dfaeSJed Brownmake
334*bcb2dfaeSJed Brown./ex1-volume -ceed /cpu/self
335*bcb2dfaeSJed Brown./ex1-volume -ceed /gpu/cuda
336*bcb2dfaeSJed Brown./ex2-surface -ceed /cpu/self
337*bcb2dfaeSJed Brown./ex2-surface -ceed /gpu/cuda
338*bcb2dfaeSJed Browncd ..
339*bcb2dfaeSJed Brown
340*bcb2dfaeSJed Brown# MFEM+libCEED examples on CPU and GPU
341*bcb2dfaeSJed Browncd mfem/
342*bcb2dfaeSJed Brownmake
343*bcb2dfaeSJed Brown./bp1 -ceed /cpu/self -no-vis
344*bcb2dfaeSJed Brown./bp3 -ceed /gpu/cuda -no-vis
345*bcb2dfaeSJed Browncd ..
346*bcb2dfaeSJed Brown
347*bcb2dfaeSJed Brown# Nek5000+libCEED examples on CPU and GPU
348*bcb2dfaeSJed Browncd nek/
349*bcb2dfaeSJed Brownmake
350*bcb2dfaeSJed Brown./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
351*bcb2dfaeSJed Brown./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
352*bcb2dfaeSJed Browncd ..
353*bcb2dfaeSJed Brown
354*bcb2dfaeSJed Brown# PETSc+libCEED examples on CPU and GPU
355*bcb2dfaeSJed Browncd petsc/
356*bcb2dfaeSJed Brownmake
357*bcb2dfaeSJed Brown./bps -problem bp1 -ceed /cpu/self
358*bcb2dfaeSJed Brown./bps -problem bp2 -ceed /gpu/cuda
359*bcb2dfaeSJed Brown./bps -problem bp3 -ceed /cpu/self
360*bcb2dfaeSJed Brown./bps -problem bp4 -ceed /gpu/cuda
361*bcb2dfaeSJed Brown./bps -problem bp5 -ceed /cpu/self
362*bcb2dfaeSJed Brown./bps -problem bp6 -ceed /gpu/cuda
363*bcb2dfaeSJed Browncd ..
364*bcb2dfaeSJed Brown
365*bcb2dfaeSJed Browncd petsc/
366*bcb2dfaeSJed Brownmake
367*bcb2dfaeSJed Brown./bpsraw -problem bp1 -ceed /cpu/self
368*bcb2dfaeSJed Brown./bpsraw -problem bp2 -ceed /gpu/cuda
369*bcb2dfaeSJed Brown./bpsraw -problem bp3 -ceed /cpu/self
370*bcb2dfaeSJed Brown./bpsraw -problem bp4 -ceed /gpu/cuda
371*bcb2dfaeSJed Brown./bpsraw -problem bp5 -ceed /cpu/self
372*bcb2dfaeSJed Brown./bpsraw -problem bp6 -ceed /gpu/cuda
373*bcb2dfaeSJed Browncd ..
374*bcb2dfaeSJed Brown
375*bcb2dfaeSJed Browncd petsc/
376*bcb2dfaeSJed Brownmake
377*bcb2dfaeSJed Brown./bpssphere -problem bp1 -ceed /cpu/self
378*bcb2dfaeSJed Brown./bpssphere -problem bp2 -ceed /gpu/cuda
379*bcb2dfaeSJed Brown./bpssphere -problem bp3 -ceed /cpu/self
380*bcb2dfaeSJed Brown./bpssphere -problem bp4 -ceed /gpu/cuda
381*bcb2dfaeSJed Brown./bpssphere -problem bp5 -ceed /cpu/self
382*bcb2dfaeSJed Brown./bpssphere -problem bp6 -ceed /gpu/cuda
383*bcb2dfaeSJed Browncd ..
384*bcb2dfaeSJed Brown
385*bcb2dfaeSJed Browncd petsc/
386*bcb2dfaeSJed Brownmake
387*bcb2dfaeSJed Brown./area -problem cube -ceed /cpu/self -degree 3
388*bcb2dfaeSJed Brown./area -problem cube -ceed /gpu/cuda -degree 3
389*bcb2dfaeSJed Brown./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
390*bcb2dfaeSJed Brown./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2
391*bcb2dfaeSJed Brown
392*bcb2dfaeSJed Browncd fluids/
393*bcb2dfaeSJed Brownmake
394*bcb2dfaeSJed Brown./navierstokes -ceed /cpu/self -degree 1
395*bcb2dfaeSJed Brown./navierstokes -ceed /gpu/cuda -degree 1
396*bcb2dfaeSJed Browncd ..
397*bcb2dfaeSJed Brown
398*bcb2dfaeSJed Browncd solids/
399*bcb2dfaeSJed Brownmake
400*bcb2dfaeSJed Brown./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
401*bcb2dfaeSJed Brown./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
402*bcb2dfaeSJed Browncd ..
403*bcb2dfaeSJed Brown```
404*bcb2dfaeSJed Brown
405*bcb2dfaeSJed BrownFor the last example shown, sample meshes to be used in place of
406*bcb2dfaeSJed Brown`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes>
407*bcb2dfaeSJed Brown
408*bcb2dfaeSJed BrownThe above code assumes a GPU-capable machine with the OCCA backend
409*bcb2dfaeSJed Brownenabled. Depending on the available backends, other CEED resource
410*bcb2dfaeSJed Brownspecifiers can be provided with the `-ceed` option. Other command line
411*bcb2dfaeSJed Brownarguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md).
412*bcb2dfaeSJed Brown
413*bcb2dfaeSJed Brown% benchmarks-marker
414*bcb2dfaeSJed Brown
415*bcb2dfaeSJed Brown## Benchmarks
416*bcb2dfaeSJed Brown
417*bcb2dfaeSJed BrownA sequence of benchmarks for all enabled backends can be run using:
418*bcb2dfaeSJed Brown
419*bcb2dfaeSJed Brown```
420*bcb2dfaeSJed Brownmake benchmarks
421*bcb2dfaeSJed Brown```
422*bcb2dfaeSJed Brown
423*bcb2dfaeSJed BrownThe results from the benchmarks are stored inside the `benchmarks/` directory
424*bcb2dfaeSJed Brownand they can be viewed using the commands (requires python with matplotlib):
425*bcb2dfaeSJed Brown
426*bcb2dfaeSJed Brown```
427*bcb2dfaeSJed Browncd benchmarks
428*bcb2dfaeSJed Brownpython postprocess-plot.py petsc-bps-bp1-*-output.txt
429*bcb2dfaeSJed Brownpython postprocess-plot.py petsc-bps-bp3-*-output.txt
430*bcb2dfaeSJed Brown```
431*bcb2dfaeSJed Brown
432*bcb2dfaeSJed BrownUsing the `benchmarks` target runs a comprehensive set of benchmarks which may
433*bcb2dfaeSJed Browntake some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder.
434*bcb2dfaeSJed Brown
435*bcb2dfaeSJed BrownFor more details about the benchmarks, see the `benchmarks/README.md` file.
436*bcb2dfaeSJed Brown
437*bcb2dfaeSJed Brown## Install
438*bcb2dfaeSJed Brown
439*bcb2dfaeSJed BrownTo install libCEED, run:
440*bcb2dfaeSJed Brown
441*bcb2dfaeSJed Brown```
442*bcb2dfaeSJed Brownmake install prefix=/usr/local
443*bcb2dfaeSJed Brown```
444*bcb2dfaeSJed Brown
445*bcb2dfaeSJed Brownor (e.g., if creating packages):
446*bcb2dfaeSJed Brown
447*bcb2dfaeSJed Brown```
448*bcb2dfaeSJed Brownmake install prefix=/usr DESTDIR=/packaging/path
449*bcb2dfaeSJed Brown```
450*bcb2dfaeSJed Brown
451*bcb2dfaeSJed BrownThe usual variables like `CC` and `CFLAGS` are used, and optimization flags
452*bcb2dfaeSJed Brownfor all languages can be set using the likes of `OPT='-O3 -march=native'`. Use
453*bcb2dfaeSJed Brown`STATIC=1` to build static libraries (`libceed.a`).
454*bcb2dfaeSJed Brown
455*bcb2dfaeSJed BrownTo install libCEED for Python, run:
456*bcb2dfaeSJed Brown
457*bcb2dfaeSJed Brown```
458*bcb2dfaeSJed Brownpip install libceed
459*bcb2dfaeSJed Brown```
460*bcb2dfaeSJed Brown
461*bcb2dfaeSJed Brownwith the desired setuptools options, such as `--user`.
462*bcb2dfaeSJed Brown
463*bcb2dfaeSJed Brown### pkg-config
464*bcb2dfaeSJed Brown
465*bcb2dfaeSJed BrownIn addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config)
466*bcb2dfaeSJed Brownfile that can be used to easily compile and link.
467*bcb2dfaeSJed Brown[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if
468*bcb2dfaeSJed Brown`$prefix` is a standard location or you set the environment variable
469*bcb2dfaeSJed Brown`PKG_CONFIG_PATH`:
470*bcb2dfaeSJed Brown
471*bcb2dfaeSJed Brown```
472*bcb2dfaeSJed Browncc `pkg-config --cflags --libs ceed` -o myapp myapp.c
473*bcb2dfaeSJed Brown```
474*bcb2dfaeSJed Brown
475*bcb2dfaeSJed Brownwill build `myapp` with libCEED.  This can be used with the source or
476*bcb2dfaeSJed Browninstalled directories.  Most build systems have support for pkg-config.
477*bcb2dfaeSJed Brown
478*bcb2dfaeSJed Brown## Contact
479*bcb2dfaeSJed Brown
480*bcb2dfaeSJed BrownYou can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov)
481*bcb2dfaeSJed Brownor by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues).
482*bcb2dfaeSJed Brown
483*bcb2dfaeSJed Brown## How to Cite
484*bcb2dfaeSJed Brown
485*bcb2dfaeSJed BrownIf you utilize libCEED please cite:
486*bcb2dfaeSJed Brown
487*bcb2dfaeSJed Brown```
488*bcb2dfaeSJed Brown@article{libceed-joss-paper,
489*bcb2dfaeSJed Brown  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
490*bcb2dfaeSJed Brown  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
491*bcb2dfaeSJed Brown  journal      = {Journal of Open Source Software},
492*bcb2dfaeSJed Brown  year         = {2021},
493*bcb2dfaeSJed Brown  publisher    = {The Open Journal},
494*bcb2dfaeSJed Brown  volume       = {6},
495*bcb2dfaeSJed Brown  number       = {63},
496*bcb2dfaeSJed Brown  pages        = {2945},
497*bcb2dfaeSJed Brown  doi          = {10.21105/joss.02945}
498*bcb2dfaeSJed Brown}
499*bcb2dfaeSJed Brown
500*bcb2dfaeSJed Brown@misc{libceed-user-manual,
501*bcb2dfaeSJed Brown  author       = {Abdelfattah, Ahmad and
502*bcb2dfaeSJed Brown                  Barra, Valeria and
503*bcb2dfaeSJed Brown                  Beams, Natalie and
504*bcb2dfaeSJed Brown                  Brown, Jed and
505*bcb2dfaeSJed Brown                  Camier, Jean-Sylvain and
506*bcb2dfaeSJed Brown                  Dobrev, Veselin and
507*bcb2dfaeSJed Brown                  Dudouit, Yohann and
508*bcb2dfaeSJed Brown                  Ghaffari, Leila and
509*bcb2dfaeSJed Brown                  Kolev, Tzanio and
510*bcb2dfaeSJed Brown                  Medina, David and
511*bcb2dfaeSJed Brown                  Pazner, Will and
512*bcb2dfaeSJed Brown                  Ratnayaka, Thilina and
513*bcb2dfaeSJed Brown                  Thompson, Jeremy L and
514*bcb2dfaeSJed Brown                  Tomov, Stanimire},
515*bcb2dfaeSJed Brown  title        = {{libCEED} User Manual},
516*bcb2dfaeSJed Brown  month        = jul,
517*bcb2dfaeSJed Brown  year         = 2021,
518*bcb2dfaeSJed Brown  publisher    = {Zenodo},
519*bcb2dfaeSJed Brown  version      = {0.9.0},
520*bcb2dfaeSJed Brown  doi          = {10.5281/zenodo.5077489}
521*bcb2dfaeSJed Brown}
522*bcb2dfaeSJed Brown```
523*bcb2dfaeSJed Brown
524*bcb2dfaeSJed BrownFor libCEED's Python interface please cite:
525*bcb2dfaeSJed Brown
526*bcb2dfaeSJed Brown```
527*bcb2dfaeSJed Brown@InProceedings{libceed-paper-proc-scipy-2020,
528*bcb2dfaeSJed Brown  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
529*bcb2dfaeSJed Brown  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
530*bcb2dfaeSJed Brown  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
531*bcb2dfaeSJed Brown  pages     = {85 - 90},
532*bcb2dfaeSJed Brown  year      = {2020},
533*bcb2dfaeSJed Brown  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
534*bcb2dfaeSJed Brown  doi       = {10.25080/Majora-342d178e-00c}
535*bcb2dfaeSJed Brown}
536*bcb2dfaeSJed Brown```
537*bcb2dfaeSJed Brown
538*bcb2dfaeSJed BrownThe BiBTeX entries for these references can be found in the
539*bcb2dfaeSJed Brown`doc/bib/references.bib` file.
540*bcb2dfaeSJed Brown
541*bcb2dfaeSJed Brown## Copyright
542*bcb2dfaeSJed Brown
543*bcb2dfaeSJed BrownThe following copyright applies to each file in the CEED software suite, unless
544*bcb2dfaeSJed Brownotherwise stated in the file:
545*bcb2dfaeSJed Brown
546*bcb2dfaeSJed Brown> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the
547*bcb2dfaeSJed Brown> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
548*bcb2dfaeSJed Brown
549*bcb2dfaeSJed BrownSee files LICENSE and NOTICE for details.
550