History log of /libCEED/backends/cuda-gen/ceed-cuda-gen-operator-build.cpp (Results 176 – 196 of 196)
Revision Date Author Comments
# 2f4ca718 05-Mar-2020 jeremylt <jeremy.thompson@colorado.edu>

CUDA - fix writeDofsStrided3d signature


# d80fc06a 24-Feb-2020 jeremylt <jeremy.thompson@colorado.edu>

CUDA - use strides as template parameters for cuda/gen


# 920dcdc4 14-Feb-2020 jeremylt <jeremy.thompson@colorado.edu>

CUDA - initial impl of strided restrictions in cuda/gen


# f2b2a896 14-Feb-2020 jeremylt <jeremy.thompson@colorado.edu>

CUDA - fix indices and strides arguments in cuda/gen


# ccedf6b0 11-Feb-2020 jeremylt <jeremy.thompson@colorado.edu>

WIP - add strided to cuda gen


# 4092d0ee 05-Feb-2020 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #447 from CEED/jeremy/lmode-on-create

ElemRestriction Lmode in Create over Apply


# 61dbc9d2 27-Jan-2020 jeremylt <jeremy.thompson@colorado.edu>

ElemRestriction - make lmode a separate enum


# a8d32208 24-Jan-2020 jeremylt <jeremy.thompson@colorado.edu>

ElemRestriction - move lmode to constructor over apply


# abfaacbb 17-Nov-2019 Sander Arens <sanderarens@gmail.com>

Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.

Now Ceed_Cuda_ref/shared/gen act like subclasses and can be properly cast to Ceed_Cuda.


# 7af48cf9 17-Nov-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #417 from CEED/jeremy/none-args

None Args


# a7b7f929 16-Nov-2019 jeremylt <jeremy.thompson@colorado.edu>

Basis - Use CEED_VECTOR_NONE for EVAL_MODE_WEIGHT


# ac421f39 17-Sep-2019 Yohann <dudouit1@llnl.gov>

Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libPar

Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.

show more ...


# ee07ded2 11-Sep-2019 Valeria Barra <39932030+valeriabarra@users.noreply.github.com>

Add CeedPragmaOMP to bps (#338)


* Convert petsc BP3&4 to loops

* Update petsc/bp4.h looping

* Switch to CeedPragmaSIMD and make examples/petsc/bp3.h consistent with bp4.h

Remove CeedPragm

Add CeedPragmaOMP to bps (#338)


* Convert petsc BP3&4 to loops

* Update petsc/bp4.h looping

* Switch to CeedPragmaSIMD and make examples/petsc/bp3.h consistent with bp4.h

Remove CeedPragmaOMP directive in Nek example and update documentation

* Remove restric qualifier in petsc/bp3.h and update documentation

show more ...


# 4d537eea 02-Sep-2019 Yohann <dudouit1@llnl.gov>

Single Source QFunction (#304)

Introduce a new macro CEED_QFUNCTION that allows to define qFunctions in a single source code independently of the targeted backend.

Thanks-to: Jeremy Thompson
Tha

Single Source QFunction (#304)

Introduce a new macro CEED_QFUNCTION that allows to define qFunctions in a single source code independently of the targeted backend.

Thanks-to: Jeremy Thompson
Thanks-to: Jed Brown
This work is the result of a fruitful discussion between Jed Brown, Jeremy Thompson and Yohann Dudouit. Jeremy Thompson also implemented important features in this commit and was very active and helpful all along the progress of this work.

[NEWS] Breaking change: QFunctionField parameter 'ncomp' changed to 'size'. This change requires setting the previous value of 'ncomp' to 'ncomp*dim' when adding a QFunctionField with eval mode 'CEED_EVAL_GRAD'.

* First steps toward cuda-gen backend!

* Closer to real code generation.

* Generated code should be ready for nvrtc.

* The code generation skeleton is ready.

* Hack with the qfunction to make the operator kernel compile.

* Some tweaks in the makefile + Input fields structure change.

* Remove using cout.

* 1d interp and grad device functions.

* 1d readDofs, readQuads, writeDofs, writeQuads.

* Remove dead code.

* readDofs, readQuads, writeDofs, writeQuads for 2d and 3d

* 2d interp and grad

* 3d interp and grad

* - weight functions for 1d,2d,3d
- link the indices to the kernel
- link the fields to the kernel
- link the basis to the kernel

* Add the qFunction reader + inlining

* Add qf files for the tests.

* Add qf file for ceed/ex1

* Add qf file for mfem/bp1

* All tests pass.

* Add qFunction for mfem/bp3, petsc/bp1, and petsc/bp3.

* mfem/bp1 passes + remove dead code

* Fix a bug in n_quads_out for writeQuads

* mfem/bp3 passes.

* All tests all examples pass.

* Temporary tweaks for mfem benchmarking

* Add Context management.

* Modify .qf files to take into account the context.

* Enable optimizations.

* First set of optimization for 2D and 3D.

* double pointer format for the qFunction.

* Change the .qf files to have the same code as the C functions.

* Make previous Cuda backends use .qf files.

* Add a return value to qFunctions.

* Make cpu backends use .qf files.

* Minor: clean commented code.

* Add guarded math.h for petsc examples.

* Remove previous nek qf files.

* Remove .cu files.

* Remove .qf files.

* Remove dead code in the tests.

* make style

* Make style fix.

* more make style fixes.

* CEED_QFUNCTION - improve macro for CPU filenames

* Add CEED_QFUNCTION macro to navierstokes.c

* Fix PETSc gitignore

* Change default NS problemtype to density_current (#307) in navierstokes.c

* Fix petsc bp1.h

* Real Fix for petsc bp1.h...

* fix

* README - Add /gpu/cuda/gen

* PETSc - Update dmplex example to use *_loc

* cuda/reg - fix typo

* Revert a couple of small changes

* Fix a bug in mfem bp3 similar to the previous bug in petsc bp3.

* Make PETSc qfunctions look closer to master, and minor style for debugging.

* More uniformity changes

* Fix a strange CUDA_OUT_OF_RESSOURCE bug.

* NS - fix fname variables

* Use a different convention for qFunction ncomp.

* update cuda-gen backend and bpsdmplex.

* PETSc - style update

* update mfem bp1 and bp3.

* Interface - Use size instead of ncomp for QFunction fields

* update ceed example and tests.

* Tests - Update ncomp to size

* CPU Backends - Update ncomp to size

* CPU Backends - style

* Nek - Update ncomp to size

* Opt - fix style

* CUDA - update ncomp to size

* Doc - Update API documentation for QFunction \ncomp->size

* OCCA - Patch QFunction ncomp -> size, work but revamp will be better

* OCCA - assert dim>0 for clang-tidy

* CUDA - Change GetNumComp to GetSize

* Basis - Shift check for dim > 0 to interface

* Doc update

* Update NS field size

* NS - Fix problem options

show more ...


# a62270dd 27-Aug-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #314 from CEED/jeremy/dof-to-node

Update DoF to Node and Style Changes


# 8795c945 22-Aug-2019 jeremylt <jeremy.thompson@colorado.edu>

Rename NDoF to NNodes and style updates


# 6aa3f2b3 21-Aug-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #310 from CEED/yohann/fix-atomicAdd

Fix atomicAdd in cuda-gen backend


# ea339264 19-Aug-2019 Yohann <yohann.dudouit@gmail.com>

Fix a bug in the description of vectorial bps in petsc (#309)

Also fix a bug introduced in cuda-gen backend to overcome the bug in the bps.


# 2acd9924 19-Aug-2019 Yohann Dudouit <yohann.dudouit@gmail.com>

Working on cc >= 6


# f1a13f77 19-Aug-2019 Yohann Dudouit <yohann.dudouit@gmail.com>

Remove atomicAdd function for compute capabilities > sm_60


# 241a4b83 25-Jul-2019 Yohann <yohann.dudouit@gmail.com>

Full jit compiled operator: cuda-gen backend (#275)

* First steps toward cuda-gen backend!

* Closer to real code generation.

* Generated code should be ready for nvrtc.

* The code generatio

Full jit compiled operator: cuda-gen backend (#275)

* First steps toward cuda-gen backend!

* Closer to real code generation.

* Generated code should be ready for nvrtc.

* The code generation skeleton is ready.

* Hack with the qfunction to make the operator kernel compile.

* Some tweaks in the makefile + Input fields structure change.

* Remove using cout.

* 1d interp and grad device functions.

* 1d readDofs, readQuads, writeDofs, writeQuads.

* Remove dead code.

* readDofs, readQuads, writeDofs, writeQuads for 2d and 3d

* 2d interp and grad

* 3d interp and grad

* - weight functions for 1d,2d,3d
- link the indices to the kernel
- link the fields to the kernel
- link the basis to the kernel

* Add the qFunction reader + inlining

* Add qf files for the tests.

* Add qf file for ceed/ex1

* Add qf file for mfem/bp1

* All tests pass.

* Add qFunction for mfem/bp3, petsc/bp1, and petsc/bp3.

* mfem/bp1 passes + remove dead code

* Fix a bug in n_quads_out for writeQuads

* mfem/bp3 passes.

* All tests all examples pass.

* Temporary tweaks for mfem benchmarking

* Add Context management.

* Modify .qf files to take into account the context.

* Enable optimizations.

* First set of optimization for 2D and 3D.

* Makefile tweaks and destructor code.

* make style.

* Add -MP flag.

* Fix linking issues with the tests.

* Update .qf files for the tests.

* Add .qf files for nek5000 examples.

* Use shared memory for B and G matrices.

* Fix bug introduced in previous commit.

show more ...


12345678