1# libCEED: Efficient Extensible Discretization 2 3[![GitHub Actions][github-badge]][github-link] 4[![GitLab-CI][gitlab-badge]][gitlab-link] 5[![Code coverage][codecov-badge]][codecov-link] 6[![BSD-2-Clause][license-badge]][license-link] 7[![Documentation][doc-badge]][doc-link] 8[![JOSS paper][joss-badge]][joss-link] 9[![Binder][binder-badge]][binder-link] 10 11## Summary and Purpose 12 13libCEED provides fast algebra for element-based discretizations, designed for 14performance portability, run-time flexibility, and clean embedding in higher 15level libraries and applications. It offers a C99 interface as well as bindings 16for Fortran, Python, Julia, and Rust. 17While our focus is on high-order finite elements, the approach is mostly 18algebraic and thus applicable to other discretizations in factored form, as 19explained in the [user manual](https://libceed.org/en/latest/) and 20API implementation portion of the 21[documentation](https://libceed.org/en/latest/api/). 22 23One of the challenges with high-order methods is that a global sparse matrix is 24no longer a good representation of a high-order linear operator, both with 25respect to the FLOPs needed for its evaluation, as well as the memory transfer 26needed for a matvec. Thus, high-order methods require a new "format" that still 27represents a linear (or more generally non-linear) operator, but not through a 28sparse matrix. 29 30The goal of libCEED is to propose such a format, as well as supporting 31implementations and data structures, that enable efficient operator evaluation 32on a variety of computational device types (CPUs, GPUs, etc.). This new operator 33description is based on algebraically 34[factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition), 35which is easy to incorporate in a wide variety of applications, without significant 36refactoring of their own discretization infrastructure. 37 38The repository is part of the 39[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of 40software benchmarks, miniapps, libraries and APIs for efficient exascale 41discretizations based on high-order finite element and spectral element methods. 42See <http://github.com/ceed> for more information and source code availability. 43 44The CEED research is supported by the 45[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) 46(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy 47organizations (Office of Science and the National Nuclear Security 48Administration) responsible for the planning and preparation of a 49[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including 50software, applications, hardware, advanced system engineering and early testbed 51platforms, in support of the nation’s exascale computing imperative. 52 53For more details on the CEED API see the [user manual](https://libceed.org/en/latest/). 54 55% gettingstarted-inclusion-marker 56 57## Building 58 59The CEED library, `libceed`, is a C99 library with no required dependencies, and 60with Fortran, Python, Julia, and Rust interfaces. It can be built using: 61 62``` 63make 64``` 65 66or, with optimization flags: 67 68``` 69make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 70``` 71 72These optimization flags are used by all languages (C, C++, Fortran) and this 73makefile variable can also be set for testing and examples (below). 74 75The library attempts to automatically detect support for the AVX 76instruction set using gcc-style compiler options for the host. 77Support may need to be manually specified via: 78 79``` 80make AVX=1 81``` 82 83or: 84 85``` 86make AVX=0 87``` 88 89if your compiler does not support gcc-style options, if you are cross 90compiling, etc. 91 92To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory 93to your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or 94an appropriate directory. To store these or other arguments as defaults for 95future invocations of `make`, use: 96 97``` 98make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2' 99``` 100 101which stores these variables in `config.mk`. 102 103## Additional Language Interfaces 104 105The Fortran interface is built alongside the library automatically. 106 107Python users can install using: 108 109``` 110pip install libceed 111``` 112 113or in a clone of the repository via `pip install .`. 114 115Julia users can install using: 116 117``` 118$ julia 119julia> ] 120pkg> add LibCEED 121``` 122 123See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/) 124for more information. 125 126Rust users can include libCEED via `Cargo.toml`: 127 128```toml 129[dependencies] 130libceed = { git = "https://github.com/CEED/libCEED", branch = "main" } 131``` 132 133See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details. 134 135## Testing 136 137The test suite produces [TAP](https://testanything.org) output and is run by: 138 139``` 140make test 141``` 142 143or, using the `prove` tool distributed with Perl (recommended): 144 145``` 146make prove 147``` 148 149## Backends 150 151There are multiple supported backends, which can be selected at runtime in the examples: 152 153| CEED resource | Backend | Deterministic Capable | 154| :--- | :--- | :---: | 155|| 156| **CPU Native** | 157| `/cpu/self/ref/serial` | Serial reference implementation | Yes | 158| `/cpu/self/ref/blocked` | Blocked reference implementation | Yes | 159| `/cpu/self/opt/serial` | Serial optimized C implementation | Yes | 160| `/cpu/self/opt/blocked` | Blocked optimized C implementation | Yes | 161| `/cpu/self/avx/serial` | Serial AVX implementation | Yes | 162| `/cpu/self/avx/blocked` | Blocked AVX implementation | Yes | 163|| 164| **CPU Valgrind** | 165| `/cpu/self/memcheck/*` | Memcheck backends, undefined value checks | Yes | 166|| 167| **CPU LIBXSMM** | 168| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | Yes | 169| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | Yes | 170|| 171| **CUDA Native** | 172| `/gpu/cuda/ref` | Reference pure CUDA kernels | Yes | 173| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | Yes | 174| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | No | 175|| 176| **HIP Native** | 177| `/gpu/hip/ref` | Reference pure HIP kernels | Yes | 178| `/gpu/hip/shared` | Optimized pure HIP kernels using shared memory | Yes | 179| `/gpu/hip/gen` | Optimized pure HIP kernels using code generation | No | 180|| 181| **MAGMA** | 182| `/gpu/cuda/magma` | CUDA MAGMA kernels | No | 183| `/gpu/cuda/magma/det` | CUDA MAGMA kernels | Yes | 184| `/gpu/hip/magma` | HIP MAGMA kernels | No | 185| `/gpu/hip/magma/det` | HIP MAGMA kernels | Yes | 186|| 187| **OCCA** | 188| `/*/occa` | Selects backend based on available OCCA modes | Yes | 189| `/cpu/self/occa` | OCCA backend with serial CPU kernels | Yes | 190| `/cpu/openmp/occa` | OCCA backend with OpenMP kernels | Yes | 191| `/gpu/cuda/occa` | OCCA backend with CUDA kernels | Yes | 192| `/gpu/hip/occa`~ | OCCA backend with HIP kernels | Yes | 193 194The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 195with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 196blocked batches of eight interlaced elements and are intended for meshes with higher numbers 197of elements. 198 199The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 200 201The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 202 203The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 204 205The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool 206to help verify that user QFunctions have no undefined values. To use, run your code with 207Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A 208'development' or 'debugging' version of Valgrind with headers is required to use this backend. 209This backend can be run in serial or blocked mode and defaults to running in the serial mode 210if `/cpu/self/memcheck` is selected at runtime. 211 212The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 213to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but 214the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be 215forced by setting the environment variable `MKL=1`. 216 217The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 218 219The `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on 220the `/gpu/cuda/*` backends. ROCm version 4.2 or newer is required. 221 222The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package. 223To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level 224MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`. 225By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends 226with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent 227directory, or set `MAGMA_DIR` to the proper location. MAGMA version 2.5.0 or newer is required. 228Currently, each MAGMA library installation is only built for either CUDA or HIP. The corresponding 229set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built 230for the version of the MAGMA library found in `MAGMA_DIR`. 231 232Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` 233after the resource name. For example: 234 235> - `/gpu/cuda/gen:device_id=1` 236 237The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 238cross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point 239to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, 240`OCCA_DIR` is set to `../occa`). 241 242Additionally, users can pass specific OCCA device properties after setting the CEED resource. 243For example: 244 245> - `"/*/occa:mode='CUDA',device_id=0"` 246 247Bit-for-bit reproducibility is important in some applications. 248However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance. 249The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above. 250 251## Examples 252 253libCEED comes with several examples of its usage, ranging from standalone C 254codes in the `/examples/ceed` directory to examples based on external packages, 255such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required. 256 257To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and 258`NEK5K_DIR` variables and run: 259 260``` 261cd examples/ 262``` 263 264% running-examples-inclusion-marker 265 266```console 267# libCEED examples on CPU and GPU 268cd ceed/ 269make 270./ex1-volume -ceed /cpu/self 271./ex1-volume -ceed /gpu/cuda 272./ex2-surface -ceed /cpu/self 273./ex2-surface -ceed /gpu/cuda 274cd .. 275 276# MFEM+libCEED examples on CPU and GPU 277cd mfem/ 278make 279./bp1 -ceed /cpu/self -no-vis 280./bp3 -ceed /gpu/cuda -no-vis 281cd .. 282 283# Nek5000+libCEED examples on CPU and GPU 284cd nek/ 285make 286./nek-examples.sh -e bp1 -ceed /cpu/self -b 3 287./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3 288cd .. 289 290# PETSc+libCEED examples on CPU and GPU 291cd petsc/ 292make 293./bps -problem bp1 -ceed /cpu/self 294./bps -problem bp2 -ceed /gpu/cuda 295./bps -problem bp3 -ceed /cpu/self 296./bps -problem bp4 -ceed /gpu/cuda 297./bps -problem bp5 -ceed /cpu/self 298./bps -problem bp6 -ceed /gpu/cuda 299cd .. 300 301cd petsc/ 302make 303./bpsraw -problem bp1 -ceed /cpu/self 304./bpsraw -problem bp2 -ceed /gpu/cuda 305./bpsraw -problem bp3 -ceed /cpu/self 306./bpsraw -problem bp4 -ceed /gpu/cuda 307./bpsraw -problem bp5 -ceed /cpu/self 308./bpsraw -problem bp6 -ceed /gpu/cuda 309cd .. 310 311cd petsc/ 312make 313./bpssphere -problem bp1 -ceed /cpu/self 314./bpssphere -problem bp2 -ceed /gpu/cuda 315./bpssphere -problem bp3 -ceed /cpu/self 316./bpssphere -problem bp4 -ceed /gpu/cuda 317./bpssphere -problem bp5 -ceed /cpu/self 318./bpssphere -problem bp6 -ceed /gpu/cuda 319cd .. 320 321cd petsc/ 322make 323./area -problem cube -ceed /cpu/self -degree 3 324./area -problem cube -ceed /gpu/cuda -degree 3 325./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2 326./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2 327 328cd fluids/ 329make 330./navierstokes -ceed /cpu/self -degree 1 331./navierstokes -ceed /gpu/cuda -degree 1 332cd .. 333 334cd solids/ 335make 336./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 337./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 338cd .. 339``` 340 341For the last example shown, sample meshes to be used in place of 342`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes> 343 344The above code assumes a GPU-capable machine with the OCCA backend 345enabled. Depending on the available backends, other CEED resource 346specifiers can be provided with the `-ceed` option. Other command line 347arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md). 348 349% benchmarks-marker 350 351## Benchmarks 352 353A sequence of benchmarks for all enabled backends can be run using: 354 355``` 356make benchmarks 357``` 358 359The results from the benchmarks are stored inside the `benchmarks/` directory 360and they can be viewed using the commands (requires python with matplotlib): 361 362``` 363cd benchmarks 364python postprocess-plot.py petsc-bps-bp1-*-output.txt 365python postprocess-plot.py petsc-bps-bp3-*-output.txt 366``` 367 368Using the `benchmarks` target runs a comprehensive set of benchmarks which may 369take some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder. 370 371For more details about the benchmarks, see the `benchmarks/README.md` file. 372 373## Install 374 375To install libCEED, run: 376 377``` 378make install prefix=/usr/local 379``` 380 381or (e.g., if creating packages): 382 383``` 384make install prefix=/usr DESTDIR=/packaging/path 385``` 386 387The usual variables like `CC` and `CFLAGS` are used, and optimization flags 388for all languages can be set using the likes of `OPT='-O3 -march=native'`. Use 389`STATIC=1` to build static libraries (`libceed.a`). 390 391To install libCEED for Python, run: 392 393``` 394pip install libceed 395``` 396 397with the desired setuptools options, such as `--user`. 398 399### pkg-config 400 401In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) 402file that can be used to easily compile and link. 403[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if 404`$prefix` is a standard location or you set the environment variable 405`PKG_CONFIG_PATH`: 406 407``` 408cc `pkg-config --cflags --libs ceed` -o myapp myapp.c 409``` 410 411will build `myapp` with libCEED. This can be used with the source or 412installed directories. Most build systems have support for pkg-config. 413 414## Contact 415 416You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) 417or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues). 418 419## How to Cite 420 421If you utilize libCEED please cite: 422 423``` 424@article{libceed-joss-paper, 425 author = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov}, 426 title = {{libCEED}: Fast algebra for high-order element-based discretizations}, 427 journal = {Journal of Open Source Software}, 428 year = {2021}, 429 publisher = {The Open Journal}, 430 volume = {6}, 431 number = {63}, 432 pages = {2945}, 433 doi = {10.21105/joss.02945} 434} 435 436@misc{libceed-user-manual, 437 author = {Abdelfattah, Ahmad and 438 Barra, Valeria and 439 Beams, Natalie and 440 Brown, Jed and 441 Camier, Jean-Sylvain and 442 Dobrev, Veselin and 443 Dudouit, Yohann and 444 Ghaffari, Leila and 445 Kolev, Tzanio and 446 Medina, David and 447 Pazner, Will and 448 Ratnayaka, Thilina and 449 Thompson, Jeremy L and 450 Tomov, Stanimire}, 451 title = {{libCEED} User Manual}, 452 month = jul, 453 year = 2021, 454 publisher = {Zenodo}, 455 version = {0.9.0}, 456 doi = {10.5281/zenodo.5077489} 457} 458``` 459 460For libCEED's Python interface please cite: 461 462``` 463@InProceedings{libceed-paper-proc-scipy-2020, 464 author = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit}, 465 title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface}, 466 booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference}, 467 pages = {85 - 90}, 468 year = {2020}, 469 editor = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe}, 470 doi = {10.25080/Majora-342d178e-00c} 471} 472``` 473 474The BiBTeX entries for these references can be found in the 475`doc/bib/references.bib` file. 476 477## Copyright 478 479The following copyright applies to each file in the CEED software suite, unless 480otherwise stated in the file: 481 482> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the 483> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved. 484 485See files LICENSE and NOTICE for details. 486 487[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg 488[github-link]: https://github.com/CEED/libCEED/actions 489[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI 490[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main 491[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg 492[codecov-link]: https://codecov.io/gh/CEED/libCEED/ 493[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg 494[license-link]: https://opensource.org/licenses/BSD-2-Clause 495[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest 496[doc-link]: https://libceed.org/en/latest/?badge=latest 497[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg 498[joss-link]: https://doi.org/10.21105/joss.02945 499[binder-badge]: http://mybinder.org/badge_logo.svg 500[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb 501