1# libCEED: Efficient Extensible Discretization 2 3[![GitHub Actions][github-badge]][github-link] 4[![GitLab-CI][gitlab-badge]][gitlab-link] 5[![Azure Pipelines][azure-badge]][azure-link] 6[![Code coverage][codecov-badge]][codecov-link] 7[![BSD-2-Clause][license-badge]][license-link] 8[![Documentation][doc-badge]][doc-link] 9[![JOSS paper][joss-badge]][joss-link] 10[![Binder][binder-badge]][binder-link] 11 12## Summary and Purpose 13 14libCEED provides fast algebra for element-based discretizations, designed for 15performance portability, run-time flexibility, and clean embedding in higher 16level libraries and applications. It offers a C99 interface as well as bindings 17for Fortran, Python, Julia, and Rust. 18While our focus is on high-order finite elements, the approach is mostly 19algebraic and thus applicable to other discretizations in factored form, as 20explained in the [user manual](https://libceed.readthedocs.io/en/latest/) and 21API implementation portion of the 22[documentation](https://libceed.readthedocs.io/en/latest/api/). 23 24One of the challenges with high-order methods is that a global sparse matrix is 25no longer a good representation of a high-order linear operator, both with 26respect to the FLOPs needed for its evaluation, as well as the memory transfer 27needed for a matvec. Thus, high-order methods require a new "format" that still 28represents a linear (or more generally non-linear) operator, but not through a 29sparse matrix. 30 31The goal of libCEED is to propose such a format, as well as supporting 32implementations and data structures, that enable efficient operator evaluation 33on a variety of computational device types (CPUs, GPUs, etc.). This new operator 34description is based on algebraically 35[factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi/#finite-element-operator-decomposition), 36which is easy to incorporate in a wide variety of applications, without significant 37refactoring of their own discretization infrastructure. 38 39The repository is part of the 40[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of 41software benchmarks, miniapps, libraries and APIs for efficient exascale 42discretizations based on high-order finite element and spectral element methods. 43See <http://github.com/ceed> for more information and source code availability. 44 45The CEED research is supported by the 46[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) 47(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy 48organizations (Office of Science and the National Nuclear Security 49Administration) responsible for the planning and preparation of a 50[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including 51software, applications, hardware, advanced system engineering and early testbed 52platforms, in support of the nation’s exascale computing imperative. 53 54For more details on the CEED API see the [user manual](https://libceed.readthedocs.io/en/latest/). 55 56% gettingstarted-inclusion-marker 57 58## Building 59 60The CEED library, `libceed`, is a C99 library with no required dependencies, and 61with Fortran, Python, Julia, and Rust interfaces. It can be built using: 62 63``` 64make 65``` 66 67or, with optimization flags: 68 69``` 70make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 71``` 72 73These optimization flags are used by all languages (C, C++, Fortran) and this 74makefile variable can also be set for testing and examples (below). 75 76The library attempts to automatically detect support for the AVX 77instruction set using gcc-style compiler options for the host. 78Support may need to be manually specified via: 79 80``` 81make AVX=1 82``` 83 84or: 85 86``` 87make AVX=0 88``` 89 90if your compiler does not support gcc-style options, if you are cross 91compiling, etc. 92 93To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory 94to your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or 95an appropriate directory. To store these or other arguments as defaults for 96future invocations of `make`, use: 97 98``` 99make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2' 100``` 101 102which stores these variables in `config.mk`. 103 104## Additional Language Interfaces 105 106The Fortran interface is built alongside the library automatically. 107 108Python users can install using: 109 110``` 111pip install libceed 112``` 113 114or in a clone of the repository via `pip install .`. 115 116Julia users can install using: 117 118``` 119$ julia 120julia> ] 121pkg> add LibCEED 122``` 123 124in the Julia package manager or in a clone of the repository via: 125 126``` 127JULIA_LIBCEED_LIB=/path/to/libceed.so julia 128julia> # press ] to enter package manager 129(env) pkg> build LibCEED 130``` 131 132Rust users can include libCEED via `Cargo.toml`: 133 134```toml 135[dependencies] 136libceed = { git = "https://github.com/CEED/libCEED", branch = "main" } 137``` 138 139See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details. 140 141## Testing 142 143The test suite produces [TAP](https://testanything.org) output and is run by: 144 145``` 146make test 147``` 148 149or, using the `prove` tool distributed with Perl (recommended): 150 151``` 152make prove 153``` 154 155## Backends 156 157There are multiple supported backends, which can be selected at runtime in the examples: 158 159| CEED resource | Backend | Deterministic Capable | 160| :--- | :--- | :---: | 161|| 162| **CPU Native** | 163| `/cpu/self/ref/serial` | Serial reference implementation | Yes | 164| `/cpu/self/ref/blocked` | Blocked reference implementation | Yes | 165| `/cpu/self/opt/serial` | Serial optimized C implementation | Yes | 166| `/cpu/self/opt/blocked` | Blocked optimized C implementation | Yes | 167| `/cpu/self/avx/serial` | Serial AVX implementation | Yes | 168| `/cpu/self/avx/blocked` | Blocked AVX implementation | Yes | 169|| 170| **CPU Valgrind** | 171| `/cpu/self/memcheck/*` | Memcheck backends, undefined value checks | Yes | 172|| 173| **CPU LIBXSMM** | 174| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | Yes | 175| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | Yes | 176|| 177| **CUDA Native** | 178| `/gpu/cuda/ref` | Reference pure CUDA kernels | Yes | 179| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | Yes | 180| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | No | 181|| 182| **HIP Native** | 183| `/gpu/hip/ref` | Reference pure HIP kernels | Yes | 184| `/gpu/hip/shared` | Optimized pure HIP kernels using shared memory | Yes | 185| `/gpu/hip/gen` | Optimized pure HIP kernels using code generation | No | 186|| 187| **MAGMA** | 188| `/gpu/cuda/magma` | CUDA MAGMA kernels | No | 189| `/gpu/cuda/magma/det` | CUDA MAGMA kernels | Yes | 190| `/gpu/hip/magma` | HIP MAGMA kernels | No | 191| `/gpu/hip/magma/det` | HIP MAGMA kernels | Yes | 192|| 193| **OCCA** | 194| `/*/occa` | Selects backend based on available OCCA modes | Yes | 195| `/cpu/self/occa` | OCCA backend with serial CPU kernels | Yes | 196| `/cpu/openmp/occa` | OCCA backend with OpenMP kernels | Yes | 197| `/gpu/cuda/occa` | OCCA backend with CUDA kernels | Yes | 198| `/gpu/hip/occa`~ | OCCA backend with HIP kernels | Yes | 199 200The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 201with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 202blocked batches of eight interlaced elements and are intended for meshes with higher numbers 203of elements. 204 205The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 206 207The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 208 209The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 210 211The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool 212to help verify that user QFunctions have no undefined values. To use, run your code with 213Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A 214'development' or 'debugging' version of Valgrind with headers is required to use this backend. 215This backend can be run in serial or blocked mode and defaults to running in the serial mode 216if `/cpu/self/memcheck` is selected at runtime. 217 218The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 219to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but 220the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be 221forced by setting the environment variable `MKL=1`. 222 223The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 224 225The `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on 226the `/gpu/cuda/*` backends. ROCm version 3.6 or newer is required. 227 228The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package. 229To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level 230MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`. 231By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends 232with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent 233directory, or set `MAGMA_DIR` to the proper location. MAGMA version 2.5.0 or newer is required. 234Currently, each MAGMA library installation is only built for either CUDA or HIP. The corresponding 235set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built 236for the version of the MAGMA library found in `MAGMA_DIR`. 237 238Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` 239after the resource name. For example: 240 241> - `/gpu/cuda/gen:device_id=1` 242 243The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 244cross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point 245to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, 246`OCCA_DIR` is set to `../occa`). 247 248Additionally, users can pass specific OCCA device properties after setting the CEED resource. 249For example: 250 251> - `"/*/occa:mode='CUDA',device_id=0"` 252 253Bit-for-bit reproducibility is important in some applications. 254However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance. 255The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above. 256 257## Examples 258 259libCEED comes with several examples of its usage, ranging from standalone C 260codes in the `/examples/ceed` directory to examples based on external packages, 261such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required. 262 263To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and 264`NEK5K_DIR` variables and run: 265 266``` 267cd examples/ 268``` 269 270% running-examples-inclusion-marker 271 272```console 273# libCEED examples on CPU and GPU 274cd ceed/ 275make 276./ex1-volume -ceed /cpu/self 277./ex1-volume -ceed /gpu/cuda 278./ex2-surface -ceed /cpu/self 279./ex2-surface -ceed /gpu/cuda 280cd .. 281 282# MFEM+libCEED examples on CPU and GPU 283cd mfem/ 284make 285./bp1 -ceed /cpu/self -no-vis 286./bp3 -ceed /gpu/cuda -no-vis 287cd .. 288 289# Nek5000+libCEED examples on CPU and GPU 290cd nek/ 291make 292./nek-examples.sh -e bp1 -ceed /cpu/self -b 3 293./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3 294cd .. 295 296# PETSc+libCEED examples on CPU and GPU 297cd petsc/ 298make 299./bps -problem bp1 -ceed /cpu/self 300./bps -problem bp2 -ceed /gpu/cuda 301./bps -problem bp3 -ceed /cpu/self 302./bps -problem bp4 -ceed /gpu/cuda 303./bps -problem bp5 -ceed /cpu/self 304./bps -problem bp6 -ceed /gpu/cuda 305cd .. 306 307cd petsc/ 308make 309./bpsraw -problem bp1 -ceed /cpu/self 310./bpsraw -problem bp2 -ceed /gpu/cuda 311./bpsraw -problem bp3 -ceed /cpu/self 312./bpsraw -problem bp4 -ceed /gpu/cuda 313./bpsraw -problem bp5 -ceed /cpu/self 314./bpsraw -problem bp6 -ceed /gpu/cuda 315cd .. 316 317cd petsc/ 318make 319./bpssphere -problem bp1 -ceed /cpu/self 320./bpssphere -problem bp2 -ceed /gpu/cuda 321./bpssphere -problem bp3 -ceed /cpu/self 322./bpssphere -problem bp4 -ceed /gpu/cuda 323./bpssphere -problem bp5 -ceed /cpu/self 324./bpssphere -problem bp6 -ceed /gpu/cuda 325cd .. 326 327cd petsc/ 328make 329./area -problem cube -ceed /cpu/self -degree 3 330./area -problem cube -ceed /gpu/cuda -degree 3 331./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2 332./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2 333 334cd fluids/ 335make 336./navierstokes -ceed /cpu/self -degree 1 337./navierstokes -ceed /gpu/cuda -degree 1 338cd .. 339 340cd solids/ 341make 342./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 343./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 344cd .. 345``` 346 347For the last example shown, sample meshes to be used in place of 348`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes> 349 350The above code assumes a GPU-capable machine with the OCCA backend 351enabled. Depending on the available backends, other CEED resource 352specifiers can be provided with the `-ceed` option. Other command line 353arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md). 354 355% benchmarks-marker 356 357## Benchmarks 358 359A sequence of benchmarks for all enabled backends can be run using: 360 361``` 362make benchmarks 363``` 364 365The results from the benchmarks are stored inside the `benchmarks/` directory 366and they can be viewed using the commands (requires python with matplotlib): 367 368``` 369cd benchmarks 370python postprocess-plot.py petsc-bps-bp1-*-output.txt 371python postprocess-plot.py petsc-bps-bp3-*-output.txt 372``` 373 374Using the `benchmarks` target runs a comprehensive set of benchmarks which may 375take some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder. 376 377For more details about the benchmarks, see the `benchmarks/README.md` file. 378 379## Install 380 381To install libCEED, run: 382 383``` 384make install prefix=/usr/local 385``` 386 387or (e.g., if creating packages): 388 389``` 390make install prefix=/usr DESTDIR=/packaging/path 391``` 392 393The usual variables like `CC` and `CFLAGS` are used, and optimization flags 394for all languages can be set using the likes of `OPT='-O3 -march=native'`. Use 395`STATIC=1` to build static libraries (`libceed.a`). 396 397To install libCEED for Python, run: 398 399``` 400pip install libceed 401``` 402 403with the desired setuptools options, such as `--user`. 404 405### pkg-config 406 407In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) 408file that can be used to easily compile and link. 409[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if 410`$prefix` is a standard location or you set the environment variable 411`PKG_CONFIG_PATH`: 412 413``` 414cc `pkg-config --cflags --libs ceed` -o myapp myapp.c 415``` 416 417will build `myapp` with libCEED. This can be used with the source or 418installed directories. Most build systems have support for pkg-config. 419 420## Contact 421 422You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) 423or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues). 424 425## How to Cite 426 427If you utilize libCEED please cite: 428 429``` 430@article{libceed-joss-paper, 431 author = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov}, 432 title = {{libCEED}: Fast algebra for high-order element-based discretizations}, 433 journal = {Journal of Open Source Software}, 434 year = {2021}, 435 publisher = {The Open Journal}, 436 volume = {6}, 437 number = {63}, 438 pages = {2945}, 439 doi = {10.21105/joss.02945} 440} 441 442@misc{libceed-user-manual, 443 author = {Abdelfattah, Ahmad and 444 Barra, Valeria and 445 Beams, Natalie and 446 Brown, Jed and 447 Camier, Jean-Sylvain and 448 Dobrev, Veselin and 449 Dudouit, Yohann and 450 Ghaffari, Leila and 451 Kolev, Tzanio and 452 Medina, David and 453 Pazner, Will and 454 Ratnayaka, Thilina and 455 Thompson, Jeremy L and 456 Tomov, Stanimire}, 457 title = {{libCEED} User Manual}, 458 month = jul, 459 year = 2021, 460 publisher = {Zenodo}, 461 version = {0.9.0}, 462 doi = {10.5281/zenodo.5077489} 463} 464``` 465 466For libCEED's Python interface please cite: 467 468``` 469@InProceedings{libceed-paper-proc-scipy-2020, 470 author = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit}, 471 title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface}, 472 booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference}, 473 pages = {85 - 90}, 474 year = {2020}, 475 editor = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe}, 476 doi = {10.25080/Majora-342d178e-00c} 477} 478``` 479 480The BiBTeX entries for these references can be found in the 481`doc/bib/references.bib` file. 482 483## Copyright 484 485The following copyright applies to each file in the CEED software suite, unless 486otherwise stated in the file: 487 488> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the 489> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved. 490 491See files LICENSE and NOTICE for details. 492 493[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg 494[github-link]: https://github.com/CEED/libCEED/actions 495[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI 496[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main 497[azure-badge]: https://dev.azure.com/CEED-ECP/libCEED/_apis/build/status/CEED.libCEED?branchName=main 498[azure-link]: https://dev.azure.com/CEED-ECP/libCEED/_build?definitionId=2 499[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg 500[codecov-link]: https://codecov.io/gh/CEED/libCEED/ 501[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg 502[license-link]: https://opensource.org/licenses/BSD-2-Clause 503[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest 504[doc-link]: https://libceed.readthedocs.io/en/latest/?badge=latest 505[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg 506[joss-link]: https://doi.org/10.21105/joss.02945 507[binder-badge]: http://mybinder.org/badge_logo.svg 508[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/tutorial-0-ceed.ipynb 509