| README.md (0a1d75a00eef2c2b2c9cdfbd3bcf319dba0408f2) | README.md (84a01de5ce080ac9cdd243d9d64da2df0ae9cb77) |
|---|---|
| 1# libCEED: the CEED API Library 2 3[](https://travis-ci.org/CEED/libCEED) 4[](https://codecov.io/gh/CEED/libCEED/) 5[](https://opensource.org/licenses/BSD-2-Clause) 6[](https://codedocs.xyz/CEED/libCEED/) 7 8## Code for Efficient Extensible Discretization --- 74 unchanged lines hidden (view full) --- 83## Backends 84 85There are multiple supported backends, which can be selected at runtime in the examples: 86 87| CEED resource | Backend | 88| :----------------------- | :------------------------------------------------ | 89| `/cpu/self/ref/serial` | Serial reference implementation | 90| `/cpu/self/ref/blocked` | Blocked refrence implementation | | 1# libCEED: the CEED API Library 2 3[](https://travis-ci.org/CEED/libCEED) 4[](https://codecov.io/gh/CEED/libCEED/) 5[](https://opensource.org/licenses/BSD-2-Clause) 6[](https://codedocs.xyz/CEED/libCEED/) 7 8## Code for Efficient Extensible Discretization --- 74 unchanged lines hidden (view full) --- 83## Backends 84 85There are multiple supported backends, which can be selected at runtime in the examples: 86 87| CEED resource | Backend | 88| :----------------------- | :------------------------------------------------ | 89| `/cpu/self/ref/serial` | Serial reference implementation | 90| `/cpu/self/ref/blocked` | Blocked refrence implementation | |
| 91| `/cpu/self/tmpl` | Backend template, dispatches to /cpu/self/blocked | 92| `/cpu/self/avx` | Blocked AVX implementation | | 91| `/cpu/self/tmpl` | Backend template, delegates to `/cpu/self/ref/blocked` | 92| `/cpu/self/avx/serial` | Serial AVX implementation | 93| `/cpu/self/avx/blocked` | Blocked AVX implementation | |
| 93| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | 94| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | 95| `/cpu/occa` | Serial OCCA kernels | 96| `/gpu/occa` | CUDA OCCA kernels | 97| `/omp/occa` | OpenMP OCCA kernels | 98| `/ocl/occa` | OpenCL OCCA kernels | 99| `/gpu/cuda` | Pure CUDA kernels | 100| `/gpu/magma` | CUDA MAGMA kernels | 101 102 103The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 104with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 105blocked batches of eight interlaced elements and are intended for meshes with higher numbers 106of elements. 107 108The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 109 | 94| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | 95| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | 96| `/cpu/occa` | Serial OCCA kernels | 97| `/gpu/occa` | CUDA OCCA kernels | 98| `/omp/occa` | OpenMP OCCA kernels | 99| `/ocl/occa` | OpenCL OCCA kernels | 100| `/gpu/cuda` | Pure CUDA kernels | 101| `/gpu/magma` | CUDA MAGMA kernels | 102 103 104The `/cpu/self/*/serial` backends process one element at a time and are intended for meshes 105with a smaller number of high order elements. The `/cpu/self/*/blocked` backends process 106blocked batches of eight interlaced elements and are intended for meshes with higher numbers 107of elements. 108 109The `/cpu/self/ref/*` backends are written in pure C and provide basic functionality. 110 |
| 110The `/cpu/self/avx` backend relies upon AVX instructions to provide vectorized CPU performance. | 111The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. |
| 111 112The `/cpu/self/xsmm/*` backends relies upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 113to provide vectorized CPU performance. 114 115The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 116cross platform performance. 117 118The `/gpu/cuda` backend provides GPU performance strictly using CUDA. --- 125 unchanged lines hidden --- | 112 113The `/cpu/self/xsmm/*` backends relies upon the [LIBXSMM](http://github.com/hfp/libxsmm) package 114to provide vectorized CPU performance. 115 116The `/*/occa` backends rely upon the [OCCA](http://github.com/libocca/occa) package to provide 117cross platform performance. 118 119The `/gpu/cuda` backend provides GPU performance strictly using CUDA. --- 125 unchanged lines hidden --- |