1*bcb2dfaeSJed Brown# libCEED: Efficient Extensible Discretization 2*bcb2dfaeSJed Brown 3*bcb2dfaeSJed Brown```{image} https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg 4*bcb2dfaeSJed Brown:alt: GitHub Actions 5*bcb2dfaeSJed Brown:target: https://github.com/CEED/libCEED/actions 6*bcb2dfaeSJed Brown``` 7*bcb2dfaeSJed Brown 8*bcb2dfaeSJed Brown```{image} https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI 9*bcb2dfaeSJed Brown:alt: GitLab-CI 10*bcb2dfaeSJed Brown:target: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main 11*bcb2dfaeSJed Brown``` 12*bcb2dfaeSJed Brown 13*bcb2dfaeSJed Brown```{image} https://dev.azure.com/CEED-ECP/libCEED/_apis/build/status/CEED.libCEED?branchName=main 14*bcb2dfaeSJed Brown:alt: Azure Pipelines 15*bcb2dfaeSJed Brown:target: https://dev.azure.com/CEED-ECP/libCEED/_build?definitionId=2 16*bcb2dfaeSJed Brown``` 17*bcb2dfaeSJed Brown 18*bcb2dfaeSJed Brown```{image} https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg 19*bcb2dfaeSJed Brown:alt: Code Coverage 20*bcb2dfaeSJed Brown:target: https://codecov.io/gh/CEED/libCEED/ 21*bcb2dfaeSJed Brown``` 22*bcb2dfaeSJed Brown 23*bcb2dfaeSJed Brown```{image} https://img.shields.io/badge/License-BSD%202--Clause-orange.svg 24*bcb2dfaeSJed Brown:alt: License 25*bcb2dfaeSJed Brown:target: https://opensource.org/licenses/BSD-2-Clause 26*bcb2dfaeSJed Brown``` 27*bcb2dfaeSJed Brown 28*bcb2dfaeSJed Brown```{image} https://readthedocs.org/projects/libceed/badge/?version=latest 29*bcb2dfaeSJed Brown:alt: Read the Docs 30*bcb2dfaeSJed Brown:target: https://libceed.readthedocs.io/en/latest/?badge=latest 31*bcb2dfaeSJed Brown``` 32*bcb2dfaeSJed Brown 33*bcb2dfaeSJed Brown```{image} https://joss.theoj.org/papers/10.21105/joss.02945/status.svg 34*bcb2dfaeSJed Brown:alt: JOSS 35*bcb2dfaeSJed Brown:target: https://doi.org/10.21105/joss.02945 36*bcb2dfaeSJed Brown``` 37*bcb2dfaeSJed Brown 38*bcb2dfaeSJed Brown```{image} http://mybinder.org/badge_logo.svg 39*bcb2dfaeSJed Brown:alt: Binder 40*bcb2dfaeSJed Brown:target: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/tutorial-0-ceed.ipynb 41*bcb2dfaeSJed Brown``` 42*bcb2dfaeSJed Brown 43*bcb2dfaeSJed Brown## Summary and Purpose 44*bcb2dfaeSJed Brown 45*bcb2dfaeSJed BrownlibCEED provides fast algebra for element-based discretizations, designed for 46*bcb2dfaeSJed Brownperformance portability, run-time flexibility, and clean embedding in higher 47*bcb2dfaeSJed Brownlevel libraries and applications. It offers a C99 interface as well as bindings 48*bcb2dfaeSJed Brownfor Fortran, Python, Julia, and Rust. 49*bcb2dfaeSJed BrownWhile our focus is on high-order finite elements, the approach is mostly 50*bcb2dfaeSJed Brownalgebraic and thus applicable to other discretizations in factored form, as 51*bcb2dfaeSJed Brownexplained in the [user manual](https://libceed.readthedocs.io/en/latest/) and 52*bcb2dfaeSJed BrownAPI implementation portion of the 53*bcb2dfaeSJed Brown[documentation](https://libceed.readthedocs.io/en/latest/api/). 54*bcb2dfaeSJed Brown 55*bcb2dfaeSJed BrownOne of the challenges with high-order methods is that a global sparse matrix is 56*bcb2dfaeSJed Brownno longer a good representation of a high-order linear operator, both with 57*bcb2dfaeSJed Brownrespect to the FLOPs needed for its evaluation, as well as the memory transfer 58*bcb2dfaeSJed Brownneeded for a matvec. Thus, high-order methods require a new "format" that still 59*bcb2dfaeSJed Brownrepresents a linear (or more generally non-linear) operator, but not through a 60*bcb2dfaeSJed Brownsparse matrix. 61*bcb2dfaeSJed Brown 62*bcb2dfaeSJed BrownThe goal of libCEED is to propose such a format, as well as supporting 63*bcb2dfaeSJed Brownimplementations and data structures, that enable efficient operator evaluation 64*bcb2dfaeSJed Brownon a variety of computational device types (CPUs, GPUs, etc.). This new operator 65*bcb2dfaeSJed Browndescription is based on algebraically 66*bcb2dfaeSJed Brown[factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi/#finite-element-operator-decomposition), 67*bcb2dfaeSJed Brownwhich is easy to incorporate in a wide variety of applications, without significant 68*bcb2dfaeSJed Brownrefactoring of their own discretization infrastructure. 69*bcb2dfaeSJed Brown 70*bcb2dfaeSJed BrownThe repository is part of the 71*bcb2dfaeSJed Brown[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of 72*bcb2dfaeSJed Brownsoftware benchmarks, miniapps, libraries and APIs for efficient exascale 73*bcb2dfaeSJed Browndiscretizations based on high-order finite element and spectral element methods. 74*bcb2dfaeSJed BrownSee <http://github.com/ceed> for more information and source code availability. 75*bcb2dfaeSJed Brown 76*bcb2dfaeSJed BrownThe CEED research is supported by the 77*bcb2dfaeSJed Brown[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) 78*bcb2dfaeSJed Brown(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy 79*bcb2dfaeSJed Brownorganizations (Office of Science and the National Nuclear Security 80*bcb2dfaeSJed BrownAdministration) responsible for the planning and preparation of a 81*bcb2dfaeSJed Brown[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including 82*bcb2dfaeSJed Brownsoftware, applications, hardware, advanced system engineering and early testbed 83*bcb2dfaeSJed Brownplatforms, in support of the nation’s exascale computing imperative. 84*bcb2dfaeSJed Brown 85*bcb2dfaeSJed BrownFor more details on the CEED API see the [user manual](https://libceed.readthedocs.io/en/latest/). 86*bcb2dfaeSJed Brown 87*bcb2dfaeSJed Brown% gettingstarted-inclusion-marker 88*bcb2dfaeSJed Brown 89*bcb2dfaeSJed Brown## Building 90*bcb2dfaeSJed Brown 91*bcb2dfaeSJed BrownThe CEED library, `libceed`, is a C99 library with no required dependencies, and 92*bcb2dfaeSJed Brownwith Fortran, Python, Julia, and Rust interfaces. It can be built using: 93*bcb2dfaeSJed Brown 94*bcb2dfaeSJed Brown``` 95*bcb2dfaeSJed Brownmake 96*bcb2dfaeSJed Brown``` 97*bcb2dfaeSJed Brown 98*bcb2dfaeSJed Brownor, with optimization flags: 99*bcb2dfaeSJed Brown 100*bcb2dfaeSJed Brown``` 101*bcb2dfaeSJed Brownmake OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 102*bcb2dfaeSJed Brown``` 103*bcb2dfaeSJed Brown 104*bcb2dfaeSJed BrownThese optimization flags are used by all languages (C, C++, Fortran) and this 105*bcb2dfaeSJed Brownmakefile variable can also be set for testing and examples (below). 106*bcb2dfaeSJed Brown 107*bcb2dfaeSJed BrownThe library attempts to automatically detect support for the AVX 108*bcb2dfaeSJed Browninstruction set using gcc-style compiler options for the host. 109*bcb2dfaeSJed BrownSupport may need to be manually specified via: 110*bcb2dfaeSJed Brown 111*bcb2dfaeSJed Brown``` 112*bcb2dfaeSJed Brownmake AVX=1 113*bcb2dfaeSJed Brown``` 114*bcb2dfaeSJed Brown 115*bcb2dfaeSJed Brownor: 116*bcb2dfaeSJed Brown 117*bcb2dfaeSJed Brown``` 118*bcb2dfaeSJed Brownmake AVX=0 119*bcb2dfaeSJed Brown``` 120*bcb2dfaeSJed Brown 121*bcb2dfaeSJed Brownif your compiler does not support gcc-style options, if you are cross 122*bcb2dfaeSJed Browncompiling, etc. 123*bcb2dfaeSJed Brown 124*bcb2dfaeSJed BrownTo enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory 125*bcb2dfaeSJed Brownto your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or 126*bcb2dfaeSJed Brownan appropriate directory. To store these or other arguments as defaults for 127*bcb2dfaeSJed Brownfuture invocations of `make`, use: 128*bcb2dfaeSJed Brown 129*bcb2dfaeSJed Brown``` 130*bcb2dfaeSJed Brownmake configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2' 131*bcb2dfaeSJed Brown``` 132*bcb2dfaeSJed Brown 133*bcb2dfaeSJed Brownwhich stores these variables in `config.mk`. 134*bcb2dfaeSJed Brown 135*bcb2dfaeSJed Brown## Additional Language Interfaces 136*bcb2dfaeSJed Brown 137*bcb2dfaeSJed BrownThe Fortran interface is built alongside the library automatically. 138*bcb2dfaeSJed Brown 139*bcb2dfaeSJed BrownPython users can install using: 140*bcb2dfaeSJed Brown 141*bcb2dfaeSJed Brown``` 142*bcb2dfaeSJed Brownpip install libceed 143*bcb2dfaeSJed Brown``` 144*bcb2dfaeSJed Brown 145*bcb2dfaeSJed Brownor in a clone of the repository via `pip install .`. 146*bcb2dfaeSJed Brown 147*bcb2dfaeSJed BrownJulia users can install using: 148*bcb2dfaeSJed Brown 149*bcb2dfaeSJed Brown``` 150*bcb2dfaeSJed Brown$ julia 151*bcb2dfaeSJed Brownjulia> ] 152*bcb2dfaeSJed Brownpkg> add LibCEED 153*bcb2dfaeSJed Brown``` 154*bcb2dfaeSJed Brown 155*bcb2dfaeSJed Brownin the Julia package manager or in a clone of the repository via: 156*bcb2dfaeSJed Brown 157*bcb2dfaeSJed Brown``` 158*bcb2dfaeSJed BrownJULIA_LIBCEED_LIB=/path/to/libceed.so julia 159*bcb2dfaeSJed Brownjulia> # press ] to enter package manager 160*bcb2dfaeSJed Brown(env) pkg> build LibCEED 161*bcb2dfaeSJed Brown``` 162*bcb2dfaeSJed Brown 163*bcb2dfaeSJed BrownRust users can include libCEED via `Cargo.toml`: 164*bcb2dfaeSJed Brown 165*bcb2dfaeSJed Brown```toml 166*bcb2dfaeSJed Brown[dependencies] 167*bcb2dfaeSJed Brownlibceed = { git = "https://github.com/CEED/libCEED", branch = "main" } 168*bcb2dfaeSJed Brown``` 169*bcb2dfaeSJed Brown 170*bcb2dfaeSJed BrownSee the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details. 171*bcb2dfaeSJed Brown 172*bcb2dfaeSJed Brown## Testing 173*bcb2dfaeSJed Brown 174*bcb2dfaeSJed BrownThe test suite produces [TAP](https://testanything.org) output and is run by: 175*bcb2dfaeSJed Brown 176*bcb2dfaeSJed Brown``` 177*bcb2dfaeSJed Brownmake test 178*bcb2dfaeSJed Brown``` 179*bcb2dfaeSJed Brown 180*bcb2dfaeSJed Brownor, using the `prove` tool distributed with Perl (recommended): 181*bcb2dfaeSJed Brown 182*bcb2dfaeSJed Brown``` 183*bcb2dfaeSJed Brownmake prove 184*bcb2dfaeSJed Brown``` 185*bcb2dfaeSJed Brown 186*bcb2dfaeSJed Brown## Backends 187*bcb2dfaeSJed Brown 188*bcb2dfaeSJed BrownThere are multiple supported backends, which can be selected at runtime in the examples: 189*bcb2dfaeSJed Brown 190*bcb2dfaeSJed Brown```{eval-rst} 191*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 192*bcb2dfaeSJed Brown| CEED resource | Backend | Deterministic Capable | 193*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 194*bcb2dfaeSJed Brown| CPU Native Backends | 195*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 196*bcb2dfaeSJed Brown| ``/cpu/self/ref/serial`` | Serial reference implementation | Yes | 197*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 198*bcb2dfaeSJed Brown| ``/cpu/self/ref/blocked`` | Blocked reference implementation | Yes | 199*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 200*bcb2dfaeSJed Brown| ``/cpu/self/opt/serial`` | Serial optimized C implementation | Yes | 201*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 202*bcb2dfaeSJed Brown| ``/cpu/self/opt/blocked`` | Blocked optimized C implementation | Yes | 203*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 204*bcb2dfaeSJed Brown| ``/cpu/self/avx/serial`` | Serial AVX implementation | Yes | 205*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 206*bcb2dfaeSJed Brown| ``/cpu/self/avx/blocked`` | Blocked AVX implementation | Yes | 207*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 208*bcb2dfaeSJed Brown| CPU Valgrind Backends | 209*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 210*bcb2dfaeSJed Brown| ``/cpu/self/memcheck/*`` | Memcheck backends, undefined value checks | Yes | 211*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 212*bcb2dfaeSJed Brown| CPU LIBXSMM Backends | 213*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 214*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/serial`` | Serial LIBXSMM implementation | Yes | 215*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 216*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/blocked`` | Blocked LIBXSMM implementation | Yes | 217*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 218*bcb2dfaeSJed Brown| CUDA Native Backends | 219*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 220*bcb2dfaeSJed Brown| ``/gpu/cuda/ref`` | Reference pure CUDA kernels | Yes | 221*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 222*bcb2dfaeSJed Brown| ``/gpu/cuda/shared`` | Optimized pure CUDA kernels using shared memory | Yes | 223*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 224*bcb2dfaeSJed Brown| ``/gpu/cuda/gen`` | Optimized pure CUDA kernels using code generation | No | 225*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 226*bcb2dfaeSJed Brown| HIP Native Backends | 227*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 228*bcb2dfaeSJed Brown| ``/gpu/hip/ref`` | Reference pure HIP kernels | Yes | 229*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 230*bcb2dfaeSJed Brown| ``/gpu/hip/shared`` | Optimized pure HIP kernels using shared memory | Yes | 231*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 232*bcb2dfaeSJed Brown| ``/gpu/hip/gen`` | Optimized pure HIP kernels using code generation | No | 233*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 234*bcb2dfaeSJed Brown| MAGMA Backends | 235*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 236*bcb2dfaeSJed Brown| ``/gpu/cuda/magma`` | CUDA MAGMA kernels | No | 237*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 238*bcb2dfaeSJed Brown| ``/gpu/cuda/magma/det`` | CUDA MAGMA kernels | Yes | 239*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 240*bcb2dfaeSJed Brown| ``/gpu/hip/magma`` | HIP MAGMA kernels | No | 241*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 242*bcb2dfaeSJed Brown| ``/gpu/hip/magma/det`` | HIP MAGMA kernels | Yes | 243*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 244*bcb2dfaeSJed Brown| OCCA Backends | 245*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 246*bcb2dfaeSJed Brown| ``/*/occa`` | Selects backend based on available OCCA modes | Yes | 247*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 248*bcb2dfaeSJed Brown| ``/cpu/self/occa`` | OCCA backend with serial CPU kernels | Yes | 249*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 250*bcb2dfaeSJed Brown| ``/cpu/openmp/occa`` | OCCA backend with OpenMP kernels | Yes | 251*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 252*bcb2dfaeSJed Brown| ``/gpu/cuda/occa`` | OCCA backend with CUDA kernels | Yes | 253*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 254*bcb2dfaeSJed Brown| ``/gpu/hip/occa`` | OCCA backend with HIP kernels | Yes | 255*bcb2dfaeSJed Brown+----------------------------+---------------------------------------------------+-----------------------+ 256*bcb2dfaeSJed Brown``` 257*bcb2dfaeSJed Brown 258*bcb2dfaeSJed BrownThe `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 259*bcb2dfaeSJed Brownwith a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 260*bcb2dfaeSJed Brownblocked batches of eight interlaced elements and are intended for meshes with higher numbers 261*bcb2dfaeSJed Brownof elements. 262*bcb2dfaeSJed Brown 263*bcb2dfaeSJed BrownThe `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 264*bcb2dfaeSJed Brown 265*bcb2dfaeSJed BrownThe `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 266*bcb2dfaeSJed Brown 267*bcb2dfaeSJed BrownThe `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 268*bcb2dfaeSJed Brown 269*bcb2dfaeSJed BrownThe `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool 270*bcb2dfaeSJed Brownto help verify that user QFunctions have no undefined values. To use, run your code with 271*bcb2dfaeSJed BrownValgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A 272*bcb2dfaeSJed Brown'development' or 'debugging' version of Valgrind with headers is required to use this backend. 273*bcb2dfaeSJed BrownThis backend can be run in serial or blocked mode and defaults to running in the serial mode 274*bcb2dfaeSJed Brownif `/cpu/self/memcheck` is selected at runtime. 275*bcb2dfaeSJed Brown 276*bcb2dfaeSJed BrownThe `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 277*bcb2dfaeSJed Brownto provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but 278*bcb2dfaeSJed Brownthe Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be 279*bcb2dfaeSJed Brownforced by setting the environment variable `MKL=1`. 280*bcb2dfaeSJed Brown 281*bcb2dfaeSJed BrownThe `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 282*bcb2dfaeSJed Brown 283*bcb2dfaeSJed BrownThe `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on 284*bcb2dfaeSJed Brownthe `/gpu/cuda/*` backends. ROCm version 3.6 or newer is required. 285*bcb2dfaeSJed Brown 286*bcb2dfaeSJed BrownThe `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package. 287*bcb2dfaeSJed BrownTo enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level 288*bcb2dfaeSJed BrownMAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`. 289*bcb2dfaeSJed BrownBy default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends 290*bcb2dfaeSJed Brownwith a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent 291*bcb2dfaeSJed Browndirectory, or set `MAGMA_DIR` to the proper location. MAGMA version 2.5.0 or newer is required. 292*bcb2dfaeSJed BrownCurrently, each MAGMA library installation is only built for either CUDA or HIP. The corresponding 293*bcb2dfaeSJed Brownset of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built 294*bcb2dfaeSJed Brownfor the version of the MAGMA library found in `MAGMA_DIR`. 295*bcb2dfaeSJed Brown 296*bcb2dfaeSJed BrownUsers can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` 297*bcb2dfaeSJed Brownafter the resource name. For example: 298*bcb2dfaeSJed Brown 299*bcb2dfaeSJed Brown> - `/gpu/cuda/gen:device_id=1` 300*bcb2dfaeSJed Brown 301*bcb2dfaeSJed BrownThe `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 302*bcb2dfaeSJed Browncross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point 303*bcb2dfaeSJed Brownto the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, 304*bcb2dfaeSJed Brown`OCCA_DIR` is set to `../occa`). 305*bcb2dfaeSJed Brown 306*bcb2dfaeSJed BrownAdditionally, users can pass specific OCCA device properties after setting the CEED resource. 307*bcb2dfaeSJed BrownFor example: 308*bcb2dfaeSJed Brown 309*bcb2dfaeSJed Brown> - `"/*/occa:mode='CUDA',device_id=0"` 310*bcb2dfaeSJed Brown 311*bcb2dfaeSJed BrownBit-for-bit reproducibility is important in some applications. 312*bcb2dfaeSJed BrownHowever, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance. 313*bcb2dfaeSJed BrownThe backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above. 314*bcb2dfaeSJed Brown 315*bcb2dfaeSJed Brown## Examples 316*bcb2dfaeSJed Brown 317*bcb2dfaeSJed BrownlibCEED comes with several examples of its usage, ranging from standalone C 318*bcb2dfaeSJed Browncodes in the `/examples/ceed` directory to examples based on external packages, 319*bcb2dfaeSJed Brownsuch as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required. 320*bcb2dfaeSJed Brown 321*bcb2dfaeSJed BrownTo build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and 322*bcb2dfaeSJed Brown`NEK5K_DIR` variables and run: 323*bcb2dfaeSJed Brown 324*bcb2dfaeSJed Brown``` 325*bcb2dfaeSJed Browncd examples/ 326*bcb2dfaeSJed Brown``` 327*bcb2dfaeSJed Brown 328*bcb2dfaeSJed Brown% running-examples-inclusion-marker 329*bcb2dfaeSJed Brown 330*bcb2dfaeSJed Brown```console 331*bcb2dfaeSJed Brown# libCEED examples on CPU and GPU 332*bcb2dfaeSJed Browncd ceed/ 333*bcb2dfaeSJed Brownmake 334*bcb2dfaeSJed Brown./ex1-volume -ceed /cpu/self 335*bcb2dfaeSJed Brown./ex1-volume -ceed /gpu/cuda 336*bcb2dfaeSJed Brown./ex2-surface -ceed /cpu/self 337*bcb2dfaeSJed Brown./ex2-surface -ceed /gpu/cuda 338*bcb2dfaeSJed Browncd .. 339*bcb2dfaeSJed Brown 340*bcb2dfaeSJed Brown# MFEM+libCEED examples on CPU and GPU 341*bcb2dfaeSJed Browncd mfem/ 342*bcb2dfaeSJed Brownmake 343*bcb2dfaeSJed Brown./bp1 -ceed /cpu/self -no-vis 344*bcb2dfaeSJed Brown./bp3 -ceed /gpu/cuda -no-vis 345*bcb2dfaeSJed Browncd .. 346*bcb2dfaeSJed Brown 347*bcb2dfaeSJed Brown# Nek5000+libCEED examples on CPU and GPU 348*bcb2dfaeSJed Browncd nek/ 349*bcb2dfaeSJed Brownmake 350*bcb2dfaeSJed Brown./nek-examples.sh -e bp1 -ceed /cpu/self -b 3 351*bcb2dfaeSJed Brown./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3 352*bcb2dfaeSJed Browncd .. 353*bcb2dfaeSJed Brown 354*bcb2dfaeSJed Brown# PETSc+libCEED examples on CPU and GPU 355*bcb2dfaeSJed Browncd petsc/ 356*bcb2dfaeSJed Brownmake 357*bcb2dfaeSJed Brown./bps -problem bp1 -ceed /cpu/self 358*bcb2dfaeSJed Brown./bps -problem bp2 -ceed /gpu/cuda 359*bcb2dfaeSJed Brown./bps -problem bp3 -ceed /cpu/self 360*bcb2dfaeSJed Brown./bps -problem bp4 -ceed /gpu/cuda 361*bcb2dfaeSJed Brown./bps -problem bp5 -ceed /cpu/self 362*bcb2dfaeSJed Brown./bps -problem bp6 -ceed /gpu/cuda 363*bcb2dfaeSJed Browncd .. 364*bcb2dfaeSJed Brown 365*bcb2dfaeSJed Browncd petsc/ 366*bcb2dfaeSJed Brownmake 367*bcb2dfaeSJed Brown./bpsraw -problem bp1 -ceed /cpu/self 368*bcb2dfaeSJed Brown./bpsraw -problem bp2 -ceed /gpu/cuda 369*bcb2dfaeSJed Brown./bpsraw -problem bp3 -ceed /cpu/self 370*bcb2dfaeSJed Brown./bpsraw -problem bp4 -ceed /gpu/cuda 371*bcb2dfaeSJed Brown./bpsraw -problem bp5 -ceed /cpu/self 372*bcb2dfaeSJed Brown./bpsraw -problem bp6 -ceed /gpu/cuda 373*bcb2dfaeSJed Browncd .. 374*bcb2dfaeSJed Brown 375*bcb2dfaeSJed Browncd petsc/ 376*bcb2dfaeSJed Brownmake 377*bcb2dfaeSJed Brown./bpssphere -problem bp1 -ceed /cpu/self 378*bcb2dfaeSJed Brown./bpssphere -problem bp2 -ceed /gpu/cuda 379*bcb2dfaeSJed Brown./bpssphere -problem bp3 -ceed /cpu/self 380*bcb2dfaeSJed Brown./bpssphere -problem bp4 -ceed /gpu/cuda 381*bcb2dfaeSJed Brown./bpssphere -problem bp5 -ceed /cpu/self 382*bcb2dfaeSJed Brown./bpssphere -problem bp6 -ceed /gpu/cuda 383*bcb2dfaeSJed Browncd .. 384*bcb2dfaeSJed Brown 385*bcb2dfaeSJed Browncd petsc/ 386*bcb2dfaeSJed Brownmake 387*bcb2dfaeSJed Brown./area -problem cube -ceed /cpu/self -degree 3 388*bcb2dfaeSJed Brown./area -problem cube -ceed /gpu/cuda -degree 3 389*bcb2dfaeSJed Brown./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2 390*bcb2dfaeSJed Brown./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2 391*bcb2dfaeSJed Brown 392*bcb2dfaeSJed Browncd fluids/ 393*bcb2dfaeSJed Brownmake 394*bcb2dfaeSJed Brown./navierstokes -ceed /cpu/self -degree 1 395*bcb2dfaeSJed Brown./navierstokes -ceed /gpu/cuda -degree 1 396*bcb2dfaeSJed Browncd .. 397*bcb2dfaeSJed Brown 398*bcb2dfaeSJed Browncd solids/ 399*bcb2dfaeSJed Brownmake 400*bcb2dfaeSJed Brown./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 401*bcb2dfaeSJed Brown./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 402*bcb2dfaeSJed Browncd .. 403*bcb2dfaeSJed Brown``` 404*bcb2dfaeSJed Brown 405*bcb2dfaeSJed BrownFor the last example shown, sample meshes to be used in place of 406*bcb2dfaeSJed Brown`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes> 407*bcb2dfaeSJed Brown 408*bcb2dfaeSJed BrownThe above code assumes a GPU-capable machine with the OCCA backend 409*bcb2dfaeSJed Brownenabled. Depending on the available backends, other CEED resource 410*bcb2dfaeSJed Brownspecifiers can be provided with the `-ceed` option. Other command line 411*bcb2dfaeSJed Brownarguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md). 412*bcb2dfaeSJed Brown 413*bcb2dfaeSJed Brown% benchmarks-marker 414*bcb2dfaeSJed Brown 415*bcb2dfaeSJed Brown## Benchmarks 416*bcb2dfaeSJed Brown 417*bcb2dfaeSJed BrownA sequence of benchmarks for all enabled backends can be run using: 418*bcb2dfaeSJed Brown 419*bcb2dfaeSJed Brown``` 420*bcb2dfaeSJed Brownmake benchmarks 421*bcb2dfaeSJed Brown``` 422*bcb2dfaeSJed Brown 423*bcb2dfaeSJed BrownThe results from the benchmarks are stored inside the `benchmarks/` directory 424*bcb2dfaeSJed Brownand they can be viewed using the commands (requires python with matplotlib): 425*bcb2dfaeSJed Brown 426*bcb2dfaeSJed Brown``` 427*bcb2dfaeSJed Browncd benchmarks 428*bcb2dfaeSJed Brownpython postprocess-plot.py petsc-bps-bp1-*-output.txt 429*bcb2dfaeSJed Brownpython postprocess-plot.py petsc-bps-bp3-*-output.txt 430*bcb2dfaeSJed Brown``` 431*bcb2dfaeSJed Brown 432*bcb2dfaeSJed BrownUsing the `benchmarks` target runs a comprehensive set of benchmarks which may 433*bcb2dfaeSJed Browntake some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder. 434*bcb2dfaeSJed Brown 435*bcb2dfaeSJed BrownFor more details about the benchmarks, see the `benchmarks/README.md` file. 436*bcb2dfaeSJed Brown 437*bcb2dfaeSJed Brown## Install 438*bcb2dfaeSJed Brown 439*bcb2dfaeSJed BrownTo install libCEED, run: 440*bcb2dfaeSJed Brown 441*bcb2dfaeSJed Brown``` 442*bcb2dfaeSJed Brownmake install prefix=/usr/local 443*bcb2dfaeSJed Brown``` 444*bcb2dfaeSJed Brown 445*bcb2dfaeSJed Brownor (e.g., if creating packages): 446*bcb2dfaeSJed Brown 447*bcb2dfaeSJed Brown``` 448*bcb2dfaeSJed Brownmake install prefix=/usr DESTDIR=/packaging/path 449*bcb2dfaeSJed Brown``` 450*bcb2dfaeSJed Brown 451*bcb2dfaeSJed BrownThe usual variables like `CC` and `CFLAGS` are used, and optimization flags 452*bcb2dfaeSJed Brownfor all languages can be set using the likes of `OPT='-O3 -march=native'`. Use 453*bcb2dfaeSJed Brown`STATIC=1` to build static libraries (`libceed.a`). 454*bcb2dfaeSJed Brown 455*bcb2dfaeSJed BrownTo install libCEED for Python, run: 456*bcb2dfaeSJed Brown 457*bcb2dfaeSJed Brown``` 458*bcb2dfaeSJed Brownpip install libceed 459*bcb2dfaeSJed Brown``` 460*bcb2dfaeSJed Brown 461*bcb2dfaeSJed Brownwith the desired setuptools options, such as `--user`. 462*bcb2dfaeSJed Brown 463*bcb2dfaeSJed Brown### pkg-config 464*bcb2dfaeSJed Brown 465*bcb2dfaeSJed BrownIn addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) 466*bcb2dfaeSJed Brownfile that can be used to easily compile and link. 467*bcb2dfaeSJed Brown[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if 468*bcb2dfaeSJed Brown`$prefix` is a standard location or you set the environment variable 469*bcb2dfaeSJed Brown`PKG_CONFIG_PATH`: 470*bcb2dfaeSJed Brown 471*bcb2dfaeSJed Brown``` 472*bcb2dfaeSJed Browncc `pkg-config --cflags --libs ceed` -o myapp myapp.c 473*bcb2dfaeSJed Brown``` 474*bcb2dfaeSJed Brown 475*bcb2dfaeSJed Brownwill build `myapp` with libCEED. This can be used with the source or 476*bcb2dfaeSJed Browninstalled directories. Most build systems have support for pkg-config. 477*bcb2dfaeSJed Brown 478*bcb2dfaeSJed Brown## Contact 479*bcb2dfaeSJed Brown 480*bcb2dfaeSJed BrownYou can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) 481*bcb2dfaeSJed Brownor by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues). 482*bcb2dfaeSJed Brown 483*bcb2dfaeSJed Brown## How to Cite 484*bcb2dfaeSJed Brown 485*bcb2dfaeSJed BrownIf you utilize libCEED please cite: 486*bcb2dfaeSJed Brown 487*bcb2dfaeSJed Brown``` 488*bcb2dfaeSJed Brown@article{libceed-joss-paper, 489*bcb2dfaeSJed Brown author = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov}, 490*bcb2dfaeSJed Brown title = {{libCEED}: Fast algebra for high-order element-based discretizations}, 491*bcb2dfaeSJed Brown journal = {Journal of Open Source Software}, 492*bcb2dfaeSJed Brown year = {2021}, 493*bcb2dfaeSJed Brown publisher = {The Open Journal}, 494*bcb2dfaeSJed Brown volume = {6}, 495*bcb2dfaeSJed Brown number = {63}, 496*bcb2dfaeSJed Brown pages = {2945}, 497*bcb2dfaeSJed Brown doi = {10.21105/joss.02945} 498*bcb2dfaeSJed Brown} 499*bcb2dfaeSJed Brown 500*bcb2dfaeSJed Brown@misc{libceed-user-manual, 501*bcb2dfaeSJed Brown author = {Abdelfattah, Ahmad and 502*bcb2dfaeSJed Brown Barra, Valeria and 503*bcb2dfaeSJed Brown Beams, Natalie and 504*bcb2dfaeSJed Brown Brown, Jed and 505*bcb2dfaeSJed Brown Camier, Jean-Sylvain and 506*bcb2dfaeSJed Brown Dobrev, Veselin and 507*bcb2dfaeSJed Brown Dudouit, Yohann and 508*bcb2dfaeSJed Brown Ghaffari, Leila and 509*bcb2dfaeSJed Brown Kolev, Tzanio and 510*bcb2dfaeSJed Brown Medina, David and 511*bcb2dfaeSJed Brown Pazner, Will and 512*bcb2dfaeSJed Brown Ratnayaka, Thilina and 513*bcb2dfaeSJed Brown Thompson, Jeremy L and 514*bcb2dfaeSJed Brown Tomov, Stanimire}, 515*bcb2dfaeSJed Brown title = {{libCEED} User Manual}, 516*bcb2dfaeSJed Brown month = jul, 517*bcb2dfaeSJed Brown year = 2021, 518*bcb2dfaeSJed Brown publisher = {Zenodo}, 519*bcb2dfaeSJed Brown version = {0.9.0}, 520*bcb2dfaeSJed Brown doi = {10.5281/zenodo.5077489} 521*bcb2dfaeSJed Brown} 522*bcb2dfaeSJed Brown``` 523*bcb2dfaeSJed Brown 524*bcb2dfaeSJed BrownFor libCEED's Python interface please cite: 525*bcb2dfaeSJed Brown 526*bcb2dfaeSJed Brown``` 527*bcb2dfaeSJed Brown@InProceedings{libceed-paper-proc-scipy-2020, 528*bcb2dfaeSJed Brown author = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit}, 529*bcb2dfaeSJed Brown title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface}, 530*bcb2dfaeSJed Brown booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference}, 531*bcb2dfaeSJed Brown pages = {85 - 90}, 532*bcb2dfaeSJed Brown year = {2020}, 533*bcb2dfaeSJed Brown editor = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe}, 534*bcb2dfaeSJed Brown doi = {10.25080/Majora-342d178e-00c} 535*bcb2dfaeSJed Brown} 536*bcb2dfaeSJed Brown``` 537*bcb2dfaeSJed Brown 538*bcb2dfaeSJed BrownThe BiBTeX entries for these references can be found in the 539*bcb2dfaeSJed Brown`doc/bib/references.bib` file. 540*bcb2dfaeSJed Brown 541*bcb2dfaeSJed Brown## Copyright 542*bcb2dfaeSJed Brown 543*bcb2dfaeSJed BrownThe following copyright applies to each file in the CEED software suite, unless 544*bcb2dfaeSJed Brownotherwise stated in the file: 545*bcb2dfaeSJed Brown 546*bcb2dfaeSJed Brown> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the 547*bcb2dfaeSJed Brown> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved. 548*bcb2dfaeSJed Brown 549*bcb2dfaeSJed BrownSee files LICENSE and NOTICE for details. 550