xref: /libCEED/README.md (revision 18b2ae98090ea0d92f2169822645484147bf8ea6)
1# libCEED: the CEED API Library
2
3[![Build Status](https://travis-ci.org/CEED/libCEED.svg?branch=master)](https://travis-ci.org/CEED/libCEED)
4[![Code Coverage](https://codecov.io/gh/CEED/libCEED/branch/master/graphs/badge.svg)](https://codecov.io/gh/CEED/libCEED/)
5[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
6[![Documentation Status](https://readthedocs.org/projects/libceed/badge/?version=latest)](https://libceed.readthedocs.io/en/latest/?badge=latest)
7[![Doxygen](https://codedocs.xyz/CEED/libCEED.svg)](https://codedocs.xyz/CEED/libCEED/)
8
9## Code for Efficient Extensible Discretization
10
11This repository contains an initial low-level API library for the efficient
12high-order discretization methods developed by the ECP co-design [Center for
13Efficient Exascale Discretizations (CEED)](http://ceed.exascaleproject.org).
14While our focus is on high-order finite elements, the approach is mostly
15algebraic and thus applicable to other discretizations in factored form, as
16explained in the [User manual](https://libceed.readthedocs.io/en/latest/) and API implementation portion of the [documentation](https://libceed.readthedocs.io/en/latest/libCEEDapi.html).
17
18One of the challenges with high-order methods is that a global sparse matrix is
19no longer a good representation of a high-order linear operator, both with
20respect to the FLOPs needed for its evaluation, as well as the memory transfer
21needed for a matvec.  Thus, high-order methods require a new "format" that still
22represents a linear (or more generally non-linear) operator, but not through a
23sparse matrix.
24
25The goal of libCEED is to propose such a format, as well as supporting
26implementations and data structures, that enable efficient operator evaluation
27on a variety of computational device types (CPUs, GPUs, etc.). This new operator
28description is based on algebraically [factored form](https://libceed.readthedocs.io/en/latest/libCEEDapi.html),
29which is easy to incorporate in a wide variety of applications, without significant
30refactoring of their own discretization infrastructure.
31
32The repository is part of the [CEED software suite][ceed-soft], a collection of
33software benchmarks, miniapps, libraries and APIs for efficient exascale
34discretizations based on high-order finite element and spectral element methods.
35See http://github.com/ceed for more information and source code availability.
36
37The CEED research is supported by the [Exascale Computing Project][ecp]
38(17-SC-20-SC), a collaborative effort of two U.S. Department of Energy
39organizations (Office of Science and the National Nuclear Security
40Administration) responsible for the planning and preparation of a [capable
41exascale ecosystem](https://exascaleproject.org/what-is-exascale), including
42software, applications, hardware, advanced system engineering and early testbed
43platforms, in support of the nation’s exascale computing imperative.
44
45For more details on the CEED API see http://ceed.exascaleproject.org/ceed-code/.
46
47For detailed instructions on how to build libCEED and run benchmarks and examples, please see the dedicated [Getting Started](https://libceed.readthedocs.io/en/latest/gettingstarted.html) page in the [User manual](https://libceed.readthedocs.io/en/latest/). A short summary is provided here.
48
49## Building
50
51The CEED library, `libceed`, is a C99 library with no required dependencies, and
52with Fortran and Python interfaces.  It can be built using
53
54    make
55
56or, with optimization flags
57
58    make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
59
60These optimization flags are used by all languages (C, C++, Fortran) and this
61makefile variable can also be set for testing and examples (below).
62Python users can install using
63
64    pip install libceed
65
66or in a clone of the repository via `pip install .`.
67The library attempts to automatically detect support for the AVX
68instruction set using gcc-style compiler options for the host.
69Support may need to be manually specified via
70
71    make AVX=1
72
73or
74
75    make AVX=0
76
77if your compiler does not support gcc-style options, if you are cross
78compiling, etc.
79
80## Testing
81
82The test suite produces [TAP](https://testanything.org) output and is run by:
83
84    make test
85
86or, using the `prove` tool distributed with Perl (recommended)
87
88    make prove
89
90## Backends
91
92There are multiple supported backends, which can be selected at runtime in the examples:
93
94| CEED resource            | Backend                                           |
95| :----------------------- | :------------------------------------------------ |
96| `/cpu/self/ref/serial`   | Serial reference implementation                   |
97| `/cpu/self/ref/blocked`  | Blocked refrence implementation                   |
98| `/cpu/self/memcheck`     | Memcheck backend, undefined value checks          |
99| `/cpu/self/opt/serial`   | Serial optimized C implementation                 |
100| `/cpu/self/opt/blocked`  | Blocked optimized C implementation                |
101| `/cpu/self/avx/serial`   | Serial AVX implementation                         |
102| `/cpu/self/avx/blocked`  | Blocked AVX implementation                        |
103| `/cpu/self/xsmm/serial`  | Serial LIBXSMM implementation                     |
104| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation                    |
105| `/cpu/occa`              | Serial OCCA kernels                               |
106| `/gpu/occa`              | CUDA OCCA kernels                                 |
107| `/omp/occa`              | OpenMP OCCA kernels                               |
108| `/ocl/occa`              | OpenCL OCCA kernels                               |
109| `/gpu/cuda/ref`          | Reference pure CUDA kernels                       |
110| `/gpu/cuda/reg`          | Pure CUDA kernels using one thread per element    |
111| `/gpu/cuda/shared`       | Optimized pure CUDA kernels using shared memory   |
112| `/gpu/cuda/gen`          | Optimized pure CUDA kernels using code generation |
113| `/gpu/magma`             | CUDA MAGMA kernels                                |
114
115The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes
116with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process
117blocked batches of eight interlaced elements and are intended for meshes with higher numbers
118of elements.
119
120The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality.
121
122The `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
123
124The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
125
126The `/cpu/self/xsmm/*` backends rely upon the [LIBXSMM](http://github.com/hfp/libxsmm) package
127to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but
128the Makefile is not detecting `MKLROOT`, linking libCEED against MKL can be
129forced by setting the environment variable `MKL=1`.
130
131The `/cpu/self/memcheck/*` backends rely upon the [Valgrind](http://valgrind.org/) Memcheck tool
132to help verify that user QFunctions have no undefined values. To use, run your code with
133Valgrind and the Memcheck backends, e.g. `valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck`. A
134'development' or 'debugging' version of Valgrind with headers is required to use this backend.
135This backend can be run in serial or blocked mode and defaults to running in the serial mode
136if `/cpu/self/memcheck` is selected at runtime.
137
138The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide
139cross platform performance.
140
141The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
142
143The `/gpu/magma` backend relies upon the [MAGMA](https://bitbucket.org/icl/magma) package.
144