| 2acd9924 | 19-Aug-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Working on cc >= 6 |
| f1a13f77 | 19-Aug-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove atomicAdd function for compute capabilities > sm_60 |
| 241a4b83 | 25-Jul-2019 |
Yohann <yohann.dudouit@gmail.com> |
Full jit compiled operator: cuda-gen backend (#275)
* First steps toward cuda-gen backend!
* Closer to real code generation.
* Generated code should be ready for nvrtc.
* The code generatio
Full jit compiled operator: cuda-gen backend (#275)
* First steps toward cuda-gen backend!
* Closer to real code generation.
* Generated code should be ready for nvrtc.
* The code generation skeleton is ready.
* Hack with the qfunction to make the operator kernel compile.
* Some tweaks in the makefile + Input fields structure change.
* Remove using cout.
* 1d interp and grad device functions.
* 1d readDofs, readQuads, writeDofs, writeQuads.
* Remove dead code.
* readDofs, readQuads, writeDofs, writeQuads for 2d and 3d
* 2d interp and grad
* 3d interp and grad
* - weight functions for 1d,2d,3d
- link the indices to the kernel
- link the fields to the kernel
- link the basis to the kernel
* Add the qFunction reader + inlining
* Add qf files for the tests.
* Add qf file for ceed/ex1
* Add qf file for mfem/bp1
* All tests pass.
* Add qFunction for mfem/bp3, petsc/bp1, and petsc/bp3.
* mfem/bp1 passes + remove dead code
* Fix a bug in n_quads_out for writeQuads
* mfem/bp3 passes.
* All tests all examples pass.
* Temporary tweaks for mfem benchmarking
* Add Context management.
* Modify .qf files to take into account the context.
* Enable optimizations.
* First set of optimization for 2D and 3D.
* Makefile tweaks and destructor code.
* make style.
* Add -MP flag.
* Fix linking issues with the tests.
* Update .qf files for the tests.
* Add .qf files for nek5000 examples.
* Use shared memory for B and G matrices.
* Fix bug introduced in previous commit.
show more ...
|
| 706bc5e6 | 18-Jul-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
backends: fix ref backend priorities |
| 6f7d248d | 12-Jul-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Update CPU backends to give default for /cpu/self/*** |
| e0fc0447 | 12-Jul-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Fix resource strcmp in xsmm backends |
| f405f806 | 04-Jul-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #289 from CEED/cuda-occa-copy-vals
Update CUDA/OCCA CEED_COPY_VALUES logic |
| ea03cb95 | 03-Jul-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Update CUDA/OCCA CEED_COPY_VALUES logic |
| 1226057f | 27-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Merge branch 'master' into yohann/cuda-restr-opt
Conflicts: backends/cuda-reg/ceed-cuda-reg-restriction.c backends/cuda-shared/ceed-cuda-shared-basis.c |
| 9d77422e | 26-Jun-2019 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'yohann/cuda-non-tensor' [PR #249]
* yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functi
Merge branch 'yohann/cuda-non-tensor' [PR #249]
* yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functions. Minor: styling Add CUDA_LIB_DIR_STUBS for systems that don't have CUDA drivers installed make style Remove useless function declaration. Add a reference non-tensor BasisApply for cuda backends.
show more ...
|
| ab7ab560 | 23-Jun-2019 |
Jed Brown <jed@jedbrown.org> |
ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError |
| 961116ec | 17-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
make style. |
| 4a6d4bbd | 17-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
namespace cuda backends functions. |
| 0109ba86 | 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Minor: styling |
| a7bd39da | 10-Jun-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Fix underinterpolation mode for /cpu/self backends |
| df4cfd6d | 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove dead or unnecessary code. |
| 3f63d318 | 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove dead code. Cuda-reg restriction optimization. |
| 698ebc35 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of 3D kernels for cuda-shared backend. |
| d94769d2 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of 1D kernels for cuda-shared backend. |
| 4247ecf3 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of 2D kernels for cuda-shared backend. |
| 717ff8a3 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Minor bug fix |
| 074be161 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of weight kernel and dynamic allocation of shared memory.
- First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per
Optimization of weight kernel and dynamic allocation of shared memory.
- First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per block) - Switch to dynamic shared memory allocation in order to batch elements for interpolation and gradient in cuda-shared backend. - Add GetPreferedMemoryType for cuda-reg and cuda-shared backends. (Can be removed in the future with delegation of this function)
show more ...
|
| d3232bb7 | 30-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of cuda-reg restriction. |
| 9ef20713 | 17-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Start the optimization of the Cuda restriction operator. |
| 103dcb42 | 31-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
OCCA backend update note |