1# libCEED: Efficient Extensible Discretization 2 3[![GitHub Actions][github-badge]][github-link] 4[![GitLab-CI][gitlab-badge]][gitlab-link] 5[![Code coverage][codecov-badge]][codecov-link] 6[![BSD-2-Clause][license-badge]][license-link] 7[![Documentation][doc-badge]][doc-link] 8[![User manual][zenodo-badge]][zenodo-link] 9[![JOSS paper][joss-badge]][joss-link] 10[![Binder][binder-badge]][binder-link] 11 12## Summary and Purpose 13 14libCEED provides fast algebra for element-based discretizations, designed for performance portability, run-time flexibility, and clean embedding in higher level libraries and applications. 15It offers a C99 interface as well as bindings for Fortran, Python, Julia, and Rust. 16While our focus is on high-order finite elements, the approach is mostly algebraic and thus applicable to other discretizations in factored form, as explained in the [user manual](https://libceed.org/en/latest/) and API implementation portion of the [documentation](https://libceed.org/en/latest/api/). 17 18One of the challenges with high-order methods is that a global sparse matrix is no longer a good representation of a high-order linear operator, both with respect to the FLOPs needed for its evaluation, as well as the memory transfer needed for a matvec. 19Thus, high-order methods require a new "format" that still represents a linear (or more generally non-linear) operator, but not through a sparse matrix. 20 21The goal of libCEED is to propose such a format, as well as supporting implementations and data structures, that enable efficient operator evaluation on a variety of computational device types (CPUs, GPUs, etc.). 22This new operator description is based on algebraically [factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition), which is easy to incorporate in a wide variety of applications, without significant refactoring of their own discretization infrastructure. 23 24The repository is part of the [CEED software suite](http://ceed.exascaleproject.org/software/), a collection of software benchmarks, miniapps, libraries and APIs for efficient exascale discretizations based on high-order finite element and spectral element methods. 25See <http://github.com/ceed> for more information and source code availability. 26 27The CEED research is supported by the [Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a [capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative. 28 29For more details on the CEED API see the [user manual](https://libceed.org/en/latest/). 30 31% gettingstarted-inclusion-marker 32 33## Building 34 35The CEED library, `libceed`, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces. 36It can be built using: 37 38```console 39$ make 40``` 41 42or, with optimization flags: 43 44```console 45$ make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 46``` 47 48These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below). 49 50The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host. 51Support may need to be manually specified via: 52 53```console 54$ make AVX=1 55``` 56 57or: 58 59```console 60$ make AVX=0 61``` 62 63if your compiler does not support gcc-style options, if you are cross compiling, etc. 64 65To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory to your `make` invocation. 66To enable HIP support, add `HIP_DIR=/opt/rocm` or an appropriate directory. 67To store these or other arguments as defaults for future invocations of `make`, use: 68 69```console 70$ make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2' 71``` 72 73which stores these variables in `config.mk`. 74 75### WebAssembly 76 77libCEED can be built for WASM using [Emscripten](https://emscripten.org). For example, one can build the library and run a standalone WASM executable using 78 79``` console 80$ emmake make build/ex2-surface.wasm 81$ wasmer build/ex2-surface.wasm -- -s 200000 82``` 83 84## Additional Language Interfaces 85 86The Fortran interface is built alongside the library automatically. 87 88Python users can install using: 89 90```console 91$ pip install libceed 92``` 93 94or in a clone of the repository via `pip install .`. 95 96Julia users can install using: 97 98```console 99$ julia 100julia> ] 101pkg> add LibCEED 102``` 103 104See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/) for more information. 105 106Rust users can include libCEED via `Cargo.toml`: 107 108```toml 109[dependencies] 110libceed = "0.11.0" 111``` 112 113See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details. 114 115## Testing 116 117The test suite produces [TAP](https://testanything.org) output and is run by: 118 119```console 120$ make test 121``` 122 123or, using the `prove` tool distributed with Perl (recommended): 124 125```console 126$ make prove 127``` 128 129## Backends 130 131There are multiple supported backends, which can be selected at runtime in the examples: 132 133| CEED resource | Backend | Deterministic Capable | 134| :--- | :--- | :---: | 135|| 136| **CPU Native** | 137| `/cpu/self/ref/serial` | Serial reference implementation | Yes | 138| `/cpu/self/ref/blocked` | Blocked reference implementation | Yes | 139| `/cpu/self/opt/serial` | Serial optimized C implementation | Yes | 140| `/cpu/self/opt/blocked` | Blocked optimized C implementation | Yes | 141| `/cpu/self/avx/serial` | Serial AVX implementation | Yes | 142| `/cpu/self/avx/blocked` | Blocked AVX implementation | Yes | 143|| 144| **CPU Valgrind** | 145| `/cpu/self/memcheck/*` | Memcheck backends, undefined value checks | Yes | 146|| 147| **CPU LIBXSMM** | 148| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | Yes | 149| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | Yes | 150|| 151| **CUDA Native** | 152| `/gpu/cuda/ref` | Reference pure CUDA kernels | Yes | 153| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | Yes | 154| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | No | 155|| 156| **HIP Native** | 157| `/gpu/hip/ref` | Reference pure HIP kernels | Yes | 158| `/gpu/hip/shared` | Optimized pure HIP kernels using shared memory | Yes | 159| `/gpu/hip/gen` | Optimized pure HIP kernels using code generation | No | 160|| 161| **MAGMA** | 162| `/gpu/cuda/magma` | CUDA MAGMA kernels | No | 163| `/gpu/cuda/magma/det` | CUDA MAGMA kernels | Yes | 164| `/gpu/hip/magma` | HIP MAGMA kernels | No | 165| `/gpu/hip/magma/det` | HIP MAGMA kernels | Yes | 166|| 167| **OCCA** | 168| `/*/occa` | Selects backend based on available OCCA modes | Yes | 169| `/cpu/self/occa` | OCCA backend with serial CPU kernels | Yes | 170| `/cpu/openmp/occa` | OCCA backend with OpenMP kernels | Yes | 171| `/cpu/dpcpp/occa` | OCCA backend with CPC++ kernels | Yes | 172| `/gpu/cuda/occa` | OCCA backend with CUDA kernels | Yes | 173| `/gpu/hip/occa`~ | OCCA backend with HIP kernels | Yes | 174 175The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes with a smaller number of high order elements. 176The `/cpu/self/*/blocked` backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements. 177 178The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 179 180The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 181 182The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 183 184The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool to help verify that user QFunctions have no undefined values. 185To use, run your code with Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. 186A 'development' or 'debugging' version of Valgrind with headers is required to use this backend. 187This backend can be run in serial or blocked mode and defaults to running in the serial mode if `/cpu/self/memcheck` is selected at runtime. 188 189The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU performance. 190If linking MKL and LIBXSMM is desired but the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be forced by setting the environment variable `MKL=1`. 191 192The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 193 194The `/gpu/hip/*` backends provide GPU performance strictly using HIP. 195They are based on the `/gpu/cuda/*` backends. 196ROCm version 4.2 or newer is required. 197 198The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package. 199To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`. 200By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent directory, or set `MAGMA_DIR` to the proper location. 201MAGMA version 2.5.0 or newer is required. 202Currently, each MAGMA library installation is only built for either CUDA or HIP. 203The corresponding set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built for the version of the MAGMA library found in `MAGMA_DIR`. 204 205Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` after the resource name. 206For example: 207 208> - `/gpu/cuda/gen:device_id=1` 209 210The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide cross platform performance. 211To enable the OCCA backend, the environment variable `OCCA_DIR` must point to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, `OCCA_DIR` is set to `../occa`). 212OCCA version 1.4.0 or newer is required. 213 214Users can pass specific OCCA device properties after setting the CEED resource. 215For example: 216 217> - `"/*/occa:mode='CUDA',device_id=0"` 218 219Bit-for-bit reproducibility is important in some applications. 220However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance. 221The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above. 222 223## Examples 224 225libCEED comes with several examples of its usage, ranging from standalone C codes in the `/examples/ceed` directory to examples based on external packages, such as MFEM, PETSc, and Nek5000. 226Nek5000 v18.0 or greater is required. 227 228To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and `NEK5K_DIR` variables and run: 229 230```console 231$ cd examples/ 232``` 233 234% running-examples-inclusion-marker 235 236```console 237# libCEED examples on CPU and GPU 238$ cd ceed/ 239$ make 240$ ./ex1-volume -ceed /cpu/self 241$ ./ex1-volume -ceed /gpu/cuda 242$ ./ex2-surface -ceed /cpu/self 243$ ./ex2-surface -ceed /gpu/cuda 244$ cd .. 245 246# MFEM+libCEED examples on CPU and GPU 247$ cd mfem/ 248$ make 249$ ./bp1 -ceed /cpu/self -no-vis 250$ ./bp3 -ceed /gpu/cuda -no-vis 251$ cd .. 252 253# Nek5000+libCEED examples on CPU and GPU 254$ cd nek/ 255$ make 256$ ./nek-examples.sh -e bp1 -ceed /cpu/self -b 3 257$ ./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3 258$ cd .. 259 260# PETSc+libCEED examples on CPU and GPU 261$ cd petsc/ 262$ make 263$ ./bps -problem bp1 -ceed /cpu/self 264$ ./bps -problem bp2 -ceed /gpu/cuda 265$ ./bps -problem bp3 -ceed /cpu/self 266$ ./bps -problem bp4 -ceed /gpu/cuda 267$ ./bps -problem bp5 -ceed /cpu/self 268$ ./bps -problem bp6 -ceed /gpu/cuda 269$ cd .. 270 271$ cd petsc/ 272$ make 273$ ./bpsraw -problem bp1 -ceed /cpu/self 274$ ./bpsraw -problem bp2 -ceed /gpu/cuda 275$ ./bpsraw -problem bp3 -ceed /cpu/self 276$ ./bpsraw -problem bp4 -ceed /gpu/cuda 277$ ./bpsraw -problem bp5 -ceed /cpu/self 278$ ./bpsraw -problem bp6 -ceed /gpu/cuda 279$ cd .. 280 281$ cd petsc/ 282$ make 283$ ./bpssphere -problem bp1 -ceed /cpu/self 284$ ./bpssphere -problem bp2 -ceed /gpu/cuda 285$ ./bpssphere -problem bp3 -ceed /cpu/self 286$ ./bpssphere -problem bp4 -ceed /gpu/cuda 287$ ./bpssphere -problem bp5 -ceed /cpu/self 288$ ./bpssphere -problem bp6 -ceed /gpu/cuda 289$ cd .. 290 291$ cd petsc/ 292$ make 293$ ./area -problem cube -ceed /cpu/self -degree 3 294$ ./area -problem cube -ceed /gpu/cuda -degree 3 295$ ./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2 296$ ./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2 297 298$ cd fluids/ 299$ make 300$ ./navierstokes -ceed /cpu/self -degree 1 301$ ./navierstokes -ceed /gpu/cuda -degree 1 302$ cd .. 303 304$ cd solids/ 305$ make 306$ ./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 307$ ./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 308$ cd .. 309``` 310 311For the last example shown, sample meshes to be used in place of `[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes> 312 313The above code assumes a GPU-capable machine with the CUDA backends enabled. 314Depending on the available backends, other CEED resource specifiers can be provided with the `-ceed` option. 315Other command line arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md). 316 317% benchmarks-marker 318 319## Benchmarks 320 321A sequence of benchmarks for all enabled backends can be run using: 322 323```console 324$ make benchmarks 325``` 326 327The results from the benchmarks are stored inside the `benchmarks/` directory and they can be viewed using the commands (requires python with matplotlib): 328 329```console 330$ cd benchmarks 331$ python postprocess-plot.py petsc-bps-bp1-*-output.txt 332$ python postprocess-plot.py petsc-bps-bp3-*-output.txt 333``` 334 335Using the `benchmarks` target runs a comprehensive set of benchmarks which may take some time to run. 336Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder. 337 338For more details about the benchmarks, see the `benchmarks/README.md` file. 339 340## Install 341 342To install libCEED, run: 343 344```console 345$ make install prefix=/path/to/install/dir 346``` 347 348or (e.g., if creating packages): 349 350```console 351$ make install prefix=/usr DESTDIR=/packaging/path 352``` 353 354To build and install in separate steps, run: 355 356```console 357$ make for_install=1 prefix=/path/to/install/dir 358$ make install prefix=/path/to/install/dir 359``` 360 361The usual variables like `CC` and `CFLAGS` are used, and optimization flags for all languages can be set using the likes of `OPT='-O3 -march=native'`. 362Use `STATIC=1` to build static libraries (`libceed.a`). 363 364To install libCEED for Python, run: 365 366```console 367$ pip install libceed 368``` 369 370with the desired setuptools options, such as `--user`. 371 372### pkg-config 373 374In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) file that can be used to easily compile and link. 375[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if `$prefix` is a standard location or you set the environment variable `PKG_CONFIG_PATH`: 376 377```console 378$ cc `pkg-config --cflags --libs ceed` -o myapp myapp.c 379``` 380 381will build `myapp` with libCEED. 382This can be used with the source or installed directories. 383Most build systems have support for pkg-config. 384 385## Contact 386 387You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues). 388 389## How to Cite 390 391If you utilize libCEED please cite: 392 393```bibtex 394@article{libceed-joss-paper, 395 author = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov}, 396 title = {{libCEED}: Fast algebra for high-order element-based discretizations}, 397 journal = {Journal of Open Source Software}, 398 year = {2021}, 399 publisher = {The Open Journal}, 400 volume = {6}, 401 number = {63}, 402 pages = {2945}, 403 doi = {10.21105/joss.02945} 404} 405 406@misc{libceed-user-manual, 407 author = {Abdelfattah, Ahmad and 408 Barra, Valeria and 409 Beams, Natalie and 410 Brown, Jed and 411 Camier, Jean-Sylvain and 412 Dobrev, Veselin and 413 Dudouit, Yohann and 414 Ghaffari, Leila and 415 Kolev, Tzanio and 416 Medina, David and 417 Pazner, Will and 418 Ratnayaka, Thilina and 419 Shakeri, Rezgar and 420 Thompson, Jeremy L and 421 Tomov, Stanimire and 422 Wright III, James}, 423 title = {{libCEED} User Manual}, 424 month = dec, 425 year = 2022, 426 publisher = {Zenodo}, 427 version = {0.11.0}, 428 doi = {10.5281/zenodo.7480454} 429} 430``` 431 432For libCEED's Python interface please cite: 433 434```bibtex 435@InProceedings{libceed-paper-proc-scipy-2020, 436 author = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit}, 437 title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface}, 438 booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference}, 439 pages = {85 - 90}, 440 year = {2020}, 441 editor = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe}, 442 doi = {10.25080/Majora-342d178e-00c} 443} 444``` 445 446The BibTeX entries for these references can be found in the `doc/bib/references.bib` file. 447 448## Copyright 449 450The following copyright applies to each file in the CEED software suite, unless otherwise stated in the file: 451 452> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the 453> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved. 454 455See files LICENSE and NOTICE for details. 456 457[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg 458[github-link]: https://github.com/CEED/libCEED/actions 459[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI 460[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main 461[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg 462[codecov-link]: https://codecov.io/gh/CEED/libCEED/ 463[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg 464[license-link]: https://opensource.org/licenses/BSD-2-Clause 465[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest 466[doc-link]: https://libceed.org/en/latest/?badge=latest 467[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg 468[joss-link]: https://doi.org/10.21105/joss.02945 469[binder-badge]: http://mybinder.org/badge_logo.svg 470[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb 471[zenodo-badge]: https://zenodo.org/badge/DOI/10.5281/zenodo.svg 472[zenodo-link]: https://doi.org/10.5281/zenodo.4302736 473