1# libCEED: Efficient Extensible Discretization 2 3[![GitHub Actions][github-badge]][github-link] 4[![GitLab-CI][gitlab-badge]][gitlab-link] 5[![Code coverage][codecov-badge]][codecov-link] 6[![BSD-2-Clause][license-badge]][license-link] 7[![Documentation][doc-badge]][doc-link] 8[![JOSS paper][joss-badge]][joss-link] 9[![Binder][binder-badge]][binder-link] 10 11## Summary and Purpose 12 13libCEED provides fast algebra for element-based discretizations, designed for 14performance portability, run-time flexibility, and clean embedding in higher 15level libraries and applications. It offers a C99 interface as well as bindings 16for Fortran, Python, Julia, and Rust. 17While our focus is on high-order finite elements, the approach is mostly 18algebraic and thus applicable to other discretizations in factored form, as 19explained in the [user manual](https://libceed.org/en/latest/) and 20API implementation portion of the 21[documentation](https://libceed.org/en/latest/api/). 22 23One of the challenges with high-order methods is that a global sparse matrix is 24no longer a good representation of a high-order linear operator, both with 25respect to the FLOPs needed for its evaluation, as well as the memory transfer 26needed for a matvec. Thus, high-order methods require a new "format" that still 27represents a linear (or more generally non-linear) operator, but not through a 28sparse matrix. 29 30The goal of libCEED is to propose such a format, as well as supporting 31implementations and data structures, that enable efficient operator evaluation 32on a variety of computational device types (CPUs, GPUs, etc.). This new operator 33description is based on algebraically 34[factored form](https://libceed.org/en/latest/libCEEDapi/#finite-element-operator-decomposition), 35which is easy to incorporate in a wide variety of applications, without significant 36refactoring of their own discretization infrastructure. 37 38The repository is part of the 39[CEED software suite](http://ceed.exascaleproject.org/software/), a collection of 40software benchmarks, miniapps, libraries and APIs for efficient exascale 41discretizations based on high-order finite element and spectral element methods. 42See <http://github.com/ceed> for more information and source code availability. 43 44The CEED research is supported by the 45[Exascale Computing Project](https://exascaleproject.org/exascale-computing-project) 46(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy 47organizations (Office of Science and the National Nuclear Security 48Administration) responsible for the planning and preparation of a 49[capable exascale ecosystem](https://exascaleproject.org/what-is-exascale), including 50software, applications, hardware, advanced system engineering and early testbed 51platforms, in support of the nation’s exascale computing imperative. 52 53For more details on the CEED API see the [user manual](https://libceed.org/en/latest/). 54 55% gettingstarted-inclusion-marker 56 57## Building 58 59The CEED library, `libceed`, is a C99 library with no required dependencies, and 60with Fortran, Python, Julia, and Rust interfaces. It can be built using: 61 62``` 63make 64``` 65 66or, with optimization flags: 67 68``` 69make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 70``` 71 72These optimization flags are used by all languages (C, C++, Fortran) and this 73makefile variable can also be set for testing and examples (below). 74 75The library attempts to automatically detect support for the AVX 76instruction set using gcc-style compiler options for the host. 77Support may need to be manually specified via: 78 79``` 80make AVX=1 81``` 82 83or: 84 85``` 86make AVX=0 87``` 88 89if your compiler does not support gcc-style options, if you are cross 90compiling, etc. 91 92To enable CUDA support, add `CUDA_DIR=/opt/cuda` or an appropriate directory 93to your `make` invocation. To enable HIP support, add `HIP_DIR=/opt/rocm` or 94an appropriate directory. To store these or other arguments as defaults for 95future invocations of `make`, use: 96 97``` 98make configure CUDA_DIR=/usr/local/cuda HIP_DIR=/opt/rocm OPT='-O3 -march=znver2' 99``` 100 101which stores these variables in `config.mk`. 102 103## Additional Language Interfaces 104 105The Fortran interface is built alongside the library automatically. 106 107Python users can install using: 108 109``` 110pip install libceed 111``` 112 113or in a clone of the repository via `pip install .`. 114 115Julia users can install using: 116 117``` 118$ julia 119julia> ] 120pkg> add LibCEED 121``` 122 123See the [LibCEED.jl documentation](http://ceed.exascaleproject.org/libCEED-julia-docs/dev/) 124for more information. 125 126Rust users can include libCEED via `Cargo.toml`: 127 128```toml 129[dependencies] 130libceed = { git = "https://github.com/CEED/libCEED", branch = "main" } 131``` 132 133See the [Cargo documentation](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) for details. 134 135## Testing 136 137The test suite produces [TAP](https://testanything.org) output and is run by: 138 139``` 140make test 141``` 142 143or, using the `prove` tool distributed with Perl (recommended): 144 145``` 146make prove 147``` 148 149## Backends 150 151There are multiple supported backends, which can be selected at runtime in the examples: 152 153| CEED resource | Backend | Deterministic Capable | 154| :--- | :--- | :---: | 155|| 156| **CPU Native** | 157| `/cpu/self/ref/serial` | Serial reference implementation | Yes | 158| `/cpu/self/ref/blocked` | Blocked reference implementation | Yes | 159| `/cpu/self/opt/serial` | Serial optimized C implementation | Yes | 160| `/cpu/self/opt/blocked` | Blocked optimized C implementation | Yes | 161| `/cpu/self/avx/serial` | Serial AVX implementation | Yes | 162| `/cpu/self/avx/blocked` | Blocked AVX implementation | Yes | 163|| 164| **CPU Valgrind** | 165| `/cpu/self/memcheck/*` | Memcheck backends, undefined value checks | Yes | 166|| 167| **CPU LIBXSMM** | 168| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | Yes | 169| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | Yes | 170|| 171| **CUDA Native** | 172| `/gpu/cuda/ref` | Reference pure CUDA kernels | Yes | 173| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | Yes | 174| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | No | 175|| 176| **HIP Native** | 177| `/gpu/hip/ref` | Reference pure HIP kernels | Yes | 178| `/gpu/hip/shared` | Optimized pure HIP kernels using shared memory | Yes | 179| `/gpu/hip/gen` | Optimized pure HIP kernels using code generation | No | 180|| 181| **MAGMA** | 182| `/gpu/cuda/magma` | CUDA MAGMA kernels | No | 183| `/gpu/cuda/magma/det` | CUDA MAGMA kernels | Yes | 184| `/gpu/hip/magma` | HIP MAGMA kernels | No | 185| `/gpu/hip/magma/det` | HIP MAGMA kernels | Yes | 186|| 187| **OCCA** | 188| `/*/occa` | Selects backend based on available OCCA modes | Yes | 189| `/cpu/self/occa` | OCCA backend with serial CPU kernels | Yes | 190| `/cpu/openmp/occa` | OCCA backend with OpenMP kernels | Yes | 191| `/gpu/cuda/occa` | OCCA backend with CUDA kernels | Yes | 192| `/gpu/hip/occa`~ | OCCA backend with HIP kernels | Yes | 193 194The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 195with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 196blocked batches of eight interlaced elements and are intended for meshes with higher numbers 197of elements. 198 199The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 200 201The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 202 203The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 204 205The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool 206to help verify that user QFunctions have no undefined values. To use, run your code with 207Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A 208'development' or 'debugging' version of Valgrind with headers is required to use this backend. 209This backend can be run in serial or blocked mode and defaults to running in the serial mode 210if `/cpu/self/memcheck` is selected at runtime. 211 212The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 213to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but 214the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be 215forced by setting the environment variable `MKL=1`. 216 217The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 218 219The `/gpu/hip/*` backends provide GPU performance strictly using HIP. They are based on 220the `/gpu/cuda/*` backends. ROCm version 4.2 or newer is required. 221 222The `/gpu/*/magma/*` backends rely upon the [MAGMA](https://bitbucket.org/icl/magma) package. 223To enable the MAGMA backends, the environment variable `MAGMA_DIR` must point to the top-level 224MAGMA directory, with the MAGMA library located in `$(MAGMA_DIR)/lib/`. 225By default, `MAGMA_DIR` is set to `../magma`; to build the MAGMA backends 226with a MAGMA installation located elsewhere, create a link to `magma/` in libCEED's parent 227directory, or set `MAGMA_DIR` to the proper location. MAGMA version 2.5.0 or newer is required. 228Currently, each MAGMA library installation is only built for either CUDA or HIP. The corresponding 229set of libCEED backends (`/gpu/cuda/magma/*` or `/gpu/hip/magma/*`) will automatically be built 230for the version of the MAGMA library found in `MAGMA_DIR`. 231 232Users can specify a device for all CUDA, HIP, and MAGMA backends through adding `:device_id=#` 233after the resource name. For example: 234 235> - `/gpu/cuda/gen:device_id=1` 236 237The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 238cross platform performance. To enable the OCCA backend, the environment variable `OCCA_DIR` must point 239to the top-level OCCA directory, with the OCCA library located in the `${OCCA_DIR}/lib` (By default, 240`OCCA_DIR` is set to `../occa`). 241 242Additionally, users can pass specific OCCA device properties after setting the CEED resource. 243For example: 244 245> - `"/*/occa:mode='CUDA',device_id=0"` 246 247Bit-for-bit reproducibility is important in some applications. 248However, some libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance. 249The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above. 250 251## Examples 252 253libCEED comes with several examples of its usage, ranging from standalone C 254codes in the `/examples/ceed` directory to examples based on external packages, 255such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required. 256 257To build the examples, set the `MFEM_DIR`, `PETSC_DIR`, and 258`NEK5K_DIR` variables and run: 259 260``` 261cd examples/ 262``` 263 264% running-examples-inclusion-marker 265 266```console 267# libCEED examples on CPU and GPU 268cd ceed/ 269make 270./ex1-volume -ceed /cpu/self 271./ex1-volume -ceed /gpu/cuda 272./ex2-surface -ceed /cpu/self 273./ex2-surface -ceed /gpu/cuda 274cd .. 275 276# MFEM+libCEED examples on CPU and GPU 277cd mfem/ 278make 279./bp1 -ceed /cpu/self -no-vis 280./bp3 -ceed /gpu/cuda -no-vis 281cd .. 282 283# Nek5000+libCEED examples on CPU and GPU 284cd nek/ 285make 286./nek-examples.sh -e bp1 -ceed /cpu/self -b 3 287./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3 288cd .. 289 290# PETSc+libCEED examples on CPU and GPU 291cd petsc/ 292make 293./bps -problem bp1 -ceed /cpu/self 294./bps -problem bp2 -ceed /gpu/cuda 295./bps -problem bp3 -ceed /cpu/self 296./bps -problem bp4 -ceed /gpu/cuda 297./bps -problem bp5 -ceed /cpu/self 298./bps -problem bp6 -ceed /gpu/cuda 299cd .. 300 301cd petsc/ 302make 303./bpsraw -problem bp1 -ceed /cpu/self 304./bpsraw -problem bp2 -ceed /gpu/cuda 305./bpsraw -problem bp3 -ceed /cpu/self 306./bpsraw -problem bp4 -ceed /gpu/cuda 307./bpsraw -problem bp5 -ceed /cpu/self 308./bpsraw -problem bp6 -ceed /gpu/cuda 309cd .. 310 311cd petsc/ 312make 313./bpssphere -problem bp1 -ceed /cpu/self 314./bpssphere -problem bp2 -ceed /gpu/cuda 315./bpssphere -problem bp3 -ceed /cpu/self 316./bpssphere -problem bp4 -ceed /gpu/cuda 317./bpssphere -problem bp5 -ceed /cpu/self 318./bpssphere -problem bp6 -ceed /gpu/cuda 319cd .. 320 321cd petsc/ 322make 323./area -problem cube -ceed /cpu/self -degree 3 324./area -problem cube -ceed /gpu/cuda -degree 3 325./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2 326./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2 327 328cd fluids/ 329make 330./navierstokes -ceed /cpu/self -degree 1 331./navierstokes -ceed /gpu/cuda -degree 1 332cd .. 333 334cd solids/ 335make 336./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 337./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms 338cd .. 339``` 340 341For the last example shown, sample meshes to be used in place of 342`[.exo file]` can be found at <https://github.com/jeremylt/ceedSampleMeshes> 343 344The above code assumes a GPU-capable machine with the OCCA backend 345enabled. Depending on the available backends, other CEED resource 346specifiers can be provided with the `-ceed` option. Other command line 347arguments can be found in [examples/petsc](https://github.com/CEED/libCEED/blob/main/examples/petsc/README.md). 348 349% benchmarks-marker 350 351## Benchmarks 352 353A sequence of benchmarks for all enabled backends can be run using: 354 355``` 356make benchmarks 357``` 358 359The results from the benchmarks are stored inside the `benchmarks/` directory 360and they can be viewed using the commands (requires python with matplotlib): 361 362``` 363cd benchmarks 364python postprocess-plot.py petsc-bps-bp1-*-output.txt 365python postprocess-plot.py petsc-bps-bp3-*-output.txt 366``` 367 368Using the `benchmarks` target runs a comprehensive set of benchmarks which may 369take some time to run. Subsets of the benchmarks can be run using the scripts in the `benchmarks` folder. 370 371For more details about the benchmarks, see the `benchmarks/README.md` file. 372 373## Install 374 375To install libCEED, run: 376 377``` 378make install prefix=/path/to/install/dir 379``` 380 381or (e.g., if creating packages): 382 383``` 384make install prefix=/usr DESTDIR=/packaging/path 385``` 386 387To build and install in separate steps, run: 388 389``` 390make for_install=1 prefix=/path/to/install/dir 391make install prefix=/path/to/install/dir 392``` 393 394The usual variables like `CC` and `CFLAGS` are used, and optimization flags 395for all languages can be set using the likes of `OPT='-O3 -march=native'`. Use 396`STATIC=1` to build static libraries (`libceed.a`). 397 398To install libCEED for Python, run: 399 400``` 401pip install libceed 402``` 403 404with the desired setuptools options, such as `--user`. 405 406### pkg-config 407 408In addition to library and header, libCEED provides a [pkg-config](https://en.wikipedia.org/wiki/Pkg-config) 409file that can be used to easily compile and link. 410[For example](https://people.freedesktop.org/~dbn/pkg-config-guide.html#faq), if 411`$prefix` is a standard location or you set the environment variable 412`PKG_CONFIG_PATH`: 413 414``` 415cc `pkg-config --cflags --libs ceed` -o myapp myapp.c 416``` 417 418will build `myapp` with libCEED. This can be used with the source or 419installed directories. Most build systems have support for pkg-config. 420 421## Contact 422 423You can reach the libCEED team by emailing [ceed-users@llnl.gov](mailto:ceed-users@llnl.gov) 424or by leaving a comment in the [issue tracker](https://github.com/CEED/libCEED/issues). 425 426## How to Cite 427 428If you utilize libCEED please cite: 429 430``` 431@article{libceed-joss-paper, 432 author = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov}, 433 title = {{libCEED}: Fast algebra for high-order element-based discretizations}, 434 journal = {Journal of Open Source Software}, 435 year = {2021}, 436 publisher = {The Open Journal}, 437 volume = {6}, 438 number = {63}, 439 pages = {2945}, 440 doi = {10.21105/joss.02945} 441} 442 443@misc{libceed-user-manual, 444 author = {Abdelfattah, Ahmad and 445 Barra, Valeria and 446 Beams, Natalie and 447 Brown, Jed and 448 Camier, Jean-Sylvain and 449 Dobrev, Veselin and 450 Dudouit, Yohann and 451 Ghaffari, Leila and 452 Kolev, Tzanio and 453 Medina, David and 454 Pazner, Will and 455 Ratnayaka, Thilina and 456 Thompson, Jeremy L and 457 Tomov, Stanimire}, 458 title = {{libCEED} User Manual}, 459 month = jul, 460 year = 2021, 461 publisher = {Zenodo}, 462 version = {0.9.0}, 463 doi = {10.5281/zenodo.5077489} 464} 465``` 466 467For libCEED's Python interface please cite: 468 469``` 470@InProceedings{libceed-paper-proc-scipy-2020, 471 author = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit}, 472 title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface}, 473 booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference}, 474 pages = {85 - 90}, 475 year = {2020}, 476 editor = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe}, 477 doi = {10.25080/Majora-342d178e-00c} 478} 479``` 480 481The BiBTeX entries for these references can be found in the 482`doc/bib/references.bib` file. 483 484## Copyright 485 486The following copyright applies to each file in the CEED software suite, unless 487otherwise stated in the file: 488 489> Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the 490> Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved. 491 492See files LICENSE and NOTICE for details. 493 494[github-badge]: https://github.com/CEED/libCEED/workflows/C/Fortran/badge.svg 495[github-link]: https://github.com/CEED/libCEED/actions 496[gitlab-badge]: https://gitlab.com/libceed/libCEED/badges/main/pipeline.svg?key_text=GitLab-CI 497[gitlab-link]: https://gitlab.com/libceed/libCEED/-/pipelines?page=1&scope=all&ref=main 498[codecov-badge]: https://codecov.io/gh/CEED/libCEED/branch/main/graphs/badge.svg 499[codecov-link]: https://codecov.io/gh/CEED/libCEED/ 500[license-badge]: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg 501[license-link]: https://opensource.org/licenses/BSD-2-Clause 502[doc-badge]: https://readthedocs.org/projects/libceed/badge/?version=latest 503[doc-link]: https://libceed.org/en/latest/?badge=latest 504[joss-badge]: https://joss.theoj.org/papers/10.21105/joss.02945/status.svg 505[joss-link]: https://doi.org/10.21105/joss.02945 506[binder-badge]: http://mybinder.org/badge_logo.svg 507[binder-link]: https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/python/tutorial-0-ceed.ipynb 508