| #
ac421f39
|
| 17-Sep-2019 |
Yohann <dudouit1@llnl.gov> |
Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libPar
Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
show more ...
|
| #
241a4b83
|
| 25-Jul-2019 |
Yohann <yohann.dudouit@gmail.com> |
Full jit compiled operator: cuda-gen backend (#275)
* First steps toward cuda-gen backend!
* Closer to real code generation.
* Generated code should be ready for nvrtc.
* The code generatio
Full jit compiled operator: cuda-gen backend (#275)
* First steps toward cuda-gen backend!
* Closer to real code generation.
* Generated code should be ready for nvrtc.
* The code generation skeleton is ready.
* Hack with the qfunction to make the operator kernel compile.
* Some tweaks in the makefile + Input fields structure change.
* Remove using cout.
* 1d interp and grad device functions.
* 1d readDofs, readQuads, writeDofs, writeQuads.
* Remove dead code.
* readDofs, readQuads, writeDofs, writeQuads for 2d and 3d
* 2d interp and grad
* 3d interp and grad
* - weight functions for 1d,2d,3d
- link the indices to the kernel
- link the fields to the kernel
- link the basis to the kernel
* Add the qFunction reader + inlining
* Add qf files for the tests.
* Add qf file for ceed/ex1
* Add qf file for mfem/bp1
* All tests pass.
* Add qFunction for mfem/bp3, petsc/bp1, and petsc/bp3.
* mfem/bp1 passes + remove dead code
* Fix a bug in n_quads_out for writeQuads
* mfem/bp3 passes.
* All tests all examples pass.
* Temporary tweaks for mfem benchmarking
* Add Context management.
* Modify .qf files to take into account the context.
* Enable optimizations.
* First set of optimization for 2D and 3D.
* Makefile tweaks and destructor code.
* make style.
* Add -MP flag.
* Fix linking issues with the tests.
* Update .qf files for the tests.
* Add .qf files for nek5000 examples.
* Use shared memory for B and G matrices.
* Fix bug introduced in previous commit.
show more ...
|