| bec1c034 | 30-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
NOLINT for OCCA tensor contract false positive |
| f5ef5ec0 | 29-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
OCCA Backend clang-tidy fixes |
| a4999edd | 24-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Update Ceed Delegate refrencing |
| aefd8378 | 29-Apr-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add delegates for specific objects |
| f8902d9e | 24-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
VecCreate -> VectorCreate |
| 89c6efa4 | 03-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Use blocking in optimized serial backends |
| 045b9c47 | 29-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Include full evec blocked backend |
| a7652942 | 28-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add restriction by block to /cpu/self/*/blocked |
| be9261b7 | 28-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add ElemRestrictionApplyBlock |
| abe33e54 | 16-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
make style |
| 469f0220 | 16-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove useless function declaration. |
| 9ad45357 | 16-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Add a reference non-tensor BasisApply for cuda backends. |
| c532df63 | 16-May-2019 |
Yohann <dudouit1@llnl.gov> |
Cuda backend using shared memory (#247)
Add a GPU backend based on Cuda using shared memory.
* Draft of a shared memory backend
* New basis apply passes all tests.
* Add the possibility to
Cuda backend using shared memory (#247)
Add a GPU backend based on Cuda using shared memory.
* Draft of a shared memory backend
* New basis apply passes all tests.
* Add the possibility to treat several elements in one block of threads.
* Fix an error in 2D and 3D gradient.
* Put the cuda-shared backend in its own folder.
* Minor cleaning.
* Replace <ceed-impl.h> with <ceed-backend.h>
* make style
* Add a few CeedChk_Cu
show more ...
|
| 8d75ea1b | 18-Apr-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Fix include statements |
| fc7cf9a0 | 18-Apr-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Set QFunction outputs undefined before apply in new memcheck backend |
| 30ea05eb | 06-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Force Context existence with cudaFree(0). |
| 5e9d07a7 | 06-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Modify the device initialization |
| 974a6da5 | 29-Apr-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Fix CeedChk with CeedChk_Cu in the Cuda backend. |
| 56f1838c | 28-Mar-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Add atomicAdd in /cuda/ref backend for compute capability < 6.0 |
| c907536f | 27-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add CeedGetPreferredMemType |
| 656dd4b7 | 24-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add error message if XSMM kernel fails to build |
| 3d0fd664 | 21-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add kernel caching to XSMM backend
Make style and comment updates
XSMM tensor ind logic fix
Logic cleanup |
| c71e1dcd | 20-Mar-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Add Basis argument to TensorContractCreate |
| de686571 | 14-Mar-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Small clang-tidy fixes (#215) |
| 55ae60f9 | 14-Mar-2019 |
Yohann <yohann.dudouit@gmail.com> |
Simple Cuda backend using one thread per element (#195)
Thanks-to: Jeremy Thompson
* Take into account the compute capability of the GPU
* Add the cuda/reg backend and rename cuda to cuda/ref.
Simple Cuda backend using one thread per element (#195)
Thanks-to: Jeremy Thompson
* Take into account the compute capability of the GPU
* Add the cuda/reg backend and rename cuda to cuda/ref.
- cuda/reg uses a simple approach where each element is
processed by one thread. This approach is expected to be
efficient for 1D and 2D problems, but very ineficient
as soon as the kernels start to spill, which should arise
around Q1D=4 for 3D problems.
* Compilation takes into account the deviceId
* Make style
* Remove dead code in cuda qFunctions.
* Cuda-reg specialized Restriction.
* Split the Prolongation operator into Identity/not Identity.
* Remove "#pragma unroll" until further perf investigation.
* README update
* Add a description of cuda/reg.
* Add CompositeOperator msg to CUDA backends
show more ...
|