1# libCEED: the CEED API Library 2 3[](https://travis-ci.org/CEED/libCEED) 4[](https://codecov.io/gh/CEED/libCEED/) 5[](https://opensource.org/licenses/BSD-2-Clause) 6[](https://libceed.readthedocs.io/en/latest/?badge=latest) 7[](https://codedocs.xyz/CEED/libCEED/) 8 9## Code for Efficient Extensible Discretization 10 11This repository contains an initial low-level API library for the efficient 12high-order discretization methods developed by the ECP co-design [Center for 13Efficient Exascale Discretizations (CEED)](http://ceed.exascaleproject.org). 14While our focus is on high-order finite elements, the approach is mostly 15algebraic and thus applicable to other discretizations in factored form, as 16explained in the [User manual](https://libceed.readthedocs.io/en/latest/) and API implementation portion of the [documentation](https://libceed.readthedocs.io/en/latest/libCEEDapi.html). 17 18One of the challenges with high-order methods is that a global sparse matrix is 19no longer a good representation of a high-order linear operator, both with 20respect to the FLOPs needed for its evaluation, as well as the memory transfer 21needed for a matvec. Thus, high-order methods require a new "format" that still 22represents a linear (or more generally non-linear) operator, but not through a 23sparse matrix. 24 25The goal of libCEED is to propose such a format, as well as supporting 26implementations and data structures, that enable efficient operator evaluation 27on a variety of computational device types (CPUs, GPUs, etc.). This new operator 28description is based on algebraically [factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi.html), 29which is easy to incorporate in a wide variety of applications, without significant 30refactoring of their own discretization infrastructure. 31 32The repository is part of the [CEED software suite][ceed-soft], a collection of 33software benchmarks, miniapps, libraries and APIs for efficient exascale 34discretizations based on high-order finite element and spectral element methods. 35See http://github.com/ceed for more information and source code availability. 36 37The CEED research is supported by the [Exascale Computing Project][ecp] 38(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy 39organizations (Office of Science and the National Nuclear Security 40Administration) responsible for the planning and preparation of a [capable 41exascale ecosystem](https://exascaleproject.org/what-is-exascale), including 42software, applications, hardware, advanced system engineering and early testbed 43platforms, in support of the nation’s exascale computing imperative. 44 45For more details on the CEED API see http://ceed.exascaleproject.org/ceed-code/. 46 47For detailed instructions on how to build libCEED and run benchmarks and examples, please see the dedicated [Getting Started](https://libceed.readthedocs.io/en/latest/gettingstarted.html) page in the [User manual](https://libceed.readthedocs.io/en/latest/). A short summary is provided here. 48 49## Building 50 51The CEED library, `libceed`, is a C99 library with no required dependencies, and 52with Fortran and Python interfaces. It can be built using 53 54 make 55 56or, with optimization flags 57 58 make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 59 60These optimization flags are used by all languages (C, C++, Fortran) and this 61makefile variable can also be set for testing and examples (below). 62Python users can install using 63 64 pip install libceed 65 66or in a clone of the repository via `pip install .`. 67The library attempts to automatically detect support for the AVX 68instruction set using gcc-style compiler options for the host. 69Support may need to be manually specified via 70 71 make AVX=1 72 73or 74 75 make AVX=0 76 77if your compiler does not support gcc-style options, if you are cross 78compiling, etc. 79 80## Testing 81 82The test suite produces [TAP](https://testanything.org) output and is run by: 83 84 make test 85 86or, using the `prove` tool distributed with Perl (recommended) 87 88 make prove 89 90## Backends 91 92There are multiple supported backends, which can be selected at runtime in the examples: 93 94| CEED resource | Backend | 95| :----------------------- | :------------------------------------------------ | 96| `/cpu/self/ref/serial` | Serial reference implementation | 97| `/cpu/self/ref/blocked` | Blocked refrence implementation | 98| `/cpu/self/memcheck` | Memcheck backend, undefined value checks | 99| `/cpu/self/opt/serial` | Serial optimized C implementation | 100| `/cpu/self/opt/blocked` | Blocked optimized C implementation | 101| `/cpu/self/avx/serial` | Serial AVX implementation | 102| `/cpu/self/avx/blocked` | Blocked AVX implementation | 103| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | 104| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | 105| `/cpu/occa` | Serial OCCA kernels | 106| `/gpu/occa` | CUDA OCCA kernels | 107| `/omp/occa` | OpenMP OCCA kernels | 108| `/ocl/occa` | OpenCL OCCA kernels | 109| `/gpu/cuda/ref` | Reference pure CUDA kernels | 110| `/gpu/cuda/reg` | Pure CUDA kernels using one thread per element | 111| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | 112| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | 113| `/gpu/magma` | CUDA MAGMA kernels | 114 115The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 116with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 117blocked batches of eight interlaced elements and are intended for meshes with higher numbers 118of elements. 119 120The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 121 122The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 123 124The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 125 126The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 127to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but 128the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be 129forced by setting the environment variable `MKL=1`. 130 131The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool 132to help verify that user QFunctions have no undefined values. To use, run your code with 133Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A 134'development' or 'debugging' version of Valgrind with headers is required to use this backend. 135This backend can be run in serial or blocked mode and defaults to running in the serial mode 136if `/cpu/self/memcheck` is selected at runtime. 137 138The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 139cross platform performance. 140 141The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 142 143The `/gpu/magma` backend relies upon the [MAGMA](https://bitbucket.org/icl/magma) package. 144