1# libCEED: the CEED API Library 2 3[](https://travis-ci.org/CEED/libCEED) 4[](https://codecov.io/gh/CEED/libCEED/) 5[](https://opensource.org/licenses/BSD-2-Clause) 6[](https://libceed.readthedocs.io/en/latest/?badge=latest) 7[](https://codedocs.xyz/CEED/libCEED/) 8 9## Code for Efficient Extensible Discretization 10 11This repository contains an initial low-level API library for the efficient 12high-order discretization methods developed by the ECP co-design [Center for 13Efficient Exascale Discretizations (CEED)](http://ceed.exascaleproject.org). 14While our focus is on high-order finite elements, the approach is mostly 15algebraic and thus applicable to other discretizations in factored form, as 16explained in the [User manual](https://libceed.readthedocs.io/en/latest/) and API documentation portion of the [Doxygen documentation](https://codedocs.xyz/CEED/libCEED/md_doc_libCEEDapi.html). 17 18One of the challenges with high-order methods is that a global sparse matrix is 19no longer a good representation of a high-order linear operator, both with 20respect to the FLOPs needed for its evaluation, as well as the memory transfer 21needed for a matvec. Thus, high-order methods require a new "format" that still 22represents a linear (or more generally non-linear) operator, but not through a 23sparse matrix. 24 25The goal of libCEED is to propose such a format, as well as supporting 26implementations and data structures, that enable efficient operator evaluation 27on a variety of computational device types (CPUs, GPUs, etc.). This new operator 28description is based on algebraically [factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi.html), 29which is easy to incorporate in a wide variety of applications, without significant 30refactoring of their own discretization infrastructure. 31 32The repository is part of the [CEED software suite][ceed-soft], a collection of 33software benchmarks, miniapps, libraries and APIs for efficient exascale 34discretizations based on high-order finite element and spectral element methods. 35See http://github.com/ceed for more information and source code availability. 36 37The CEED research is supported by the [Exascale Computing Project][ecp] 38(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy 39organizations (Office of Science and the National Nuclear Security 40Administration) responsible for the planning and preparation of a [capable 41exascale ecosystem](https://exascaleproject.org/what-is-exascale), including 42software, applications, hardware, advanced system engineering and early testbed 43platforms, in support of the nation’s exascale computing imperative. 44 45For more details on the CEED API see http://ceed.exascaleproject.org/ceed-code/. 46 47For detailed instructions on how to build libCEED and run benchmarks and examples, please see the dedicated [Getting Started](https://libceed.readthedocs.io/en/latest/gettingstarted.html) page in the [User manual](https://libceed.readthedocs.io/en/latest/). A short summary is provided here. 48 49## Building 50 51The CEED library, `libceed`, is a C99 library with no external dependencies. The library 52has Fortran and Python interfaces; see `interface/ceed-fortran.c` and 53`interface/ceed-python/`. It can be built using 54 55 make 56 57or, with optimization flags 58 59 make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast' 60 61These optimization flags are used by all languages (C, C++, Fortran) and this 62makefile variable can also be set for testing and examples (below). 63 64The library attempts to automatically detect support for the AVX 65instruction set using gcc-style compiler options for the host. 66Support may need to be manually specified via 67 68 make AVX=1 69 70or 71 72 make AVX=0 73 74if your compiler does not support gcc-style options, if you are cross 75compiling, etc. 76 77## Testing 78 79The test suite produces [TAP](https://testanything.org) output and is run by: 80 81 make test 82 83or, using the `prove` tool distributed with Perl (recommended) 84 85 make prove 86 87## Backends 88 89There are multiple supported backends, which can be selected at runtime in the examples: 90 91| CEED resource | Backend | 92| :----------------------- | :------------------------------------------------ | 93| `/cpu/self/ref/serial` | Serial reference implementation | 94| `/cpu/self/ref/blocked` | Blocked refrence implementation | 95| `/cpu/self/memcheck` | Memcheck backend, undefined value checks | 96| `/cpu/self/opt/serial` | Serial optimized C implementation | 97| `/cpu/self/opt/blocked` | Blocked optimized C implementation | 98| `/cpu/self/avx/serial` | Serial AVX implementation | 99| `/cpu/self/avx/blocked` | Blocked AVX implementation | 100| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | 101| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | 102| `/cpu/occa` | Serial OCCA kernels | 103| `/gpu/occa` | CUDA OCCA kernels | 104| `/omp/occa` | OpenMP OCCA kernels | 105| `/ocl/occa` | OpenCL OCCA kernels | 106| `/gpu/cuda/ref` | Reference pure CUDA kernels | 107| `/gpu/cuda/reg` | Pure CUDA kernels using one thread per element | 108| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | 109| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | 110| `/gpu/magma` | CUDA MAGMA kernels | 111 112The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 113with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 114blocked batches of eight interlaced elements and are intended for meshes with higher numbers 115of elements. 116 117The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 118 119The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 120 121The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 122 123The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 124to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but 125the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be 126forced by setting the environment variable `MKL=1`. 127 128The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool 129to help verify that user QFunctions have no undefined values. To use, run your code with 130Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A 131'development' or 'debugging' version of Valgrind with headers is required to use this backend. 132This backend can be run in serial or blocked mode and defaults to running in the serial mode 133if `/cpu/self/memcheck` is selected at runtime. 134 135The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 136cross platform performance. 137 138The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 139 140The `/gpu/magma` backend relies upon the [MAGMA](https://bitbucket.org/icl/magma) package. 141