| 461525f5 | 17-Sep-2020 |
Natalie Beams <246972+nbeams@users.noreply.github.com> |
Consolidate CUDA backends (#623)
* Travis - allow icc failure for now
* move cuda-reg basis init kernels to cuda-shared
* move cuda-reg restrictions to cuda-ref
* change delegate ceeds for
Consolidate CUDA backends (#623)
* Travis - allow icc failure for now
* move cuda-reg basis init kernels to cuda-shared
* move cuda-reg restrictions to cuda-ref
* change delegate ceeds for previous uses of cuda-reg
* remove cuda-reg backend
* update hip restrictions to match cuda
* update backends list in README
* make style
* update release notes for removal of cuda-reg
Co-authored-by: jeremylt <thompson.jeremy.luke@gmail.com>
show more ...
|
| 18d499f1 | 17-Sep-2020 |
Yohann <dudouit1@llnl.gov> |
Enable under-integration for cuda-shared and cuda-gen backends (#620)
* Support under integration in cuda-shared.
* Add under-integration to the cuda-gen backend.
* Fix bugs when under-integ i
Enable under-integration for cuda-shared and cuda-gen backends (#620)
* Support under integration in cuda-shared.
* Add under-integration to the cuda-gen backend.
* Fix bugs when under-integ in cuda-shared.
* Factor some code.
* Factor some code in cuda-gen.
* Guard more carefully.
* Introduce T1d.
* Fix a bug in readQuads3d
* Fix bugs in 3D.
* Fix a typo
* Safety init.
* Try something with ContractZ3d.
* Guard the add
* revert add.
* Add more thread guards
* Same as previous
* Fix a bug in add.
* style.
* Check that the bases are tensor in cuda-gen.
* move isTensor
* Add T1d to cuda-gen and guard contractions.
* Fix typos.
* add guards in 1d.
* Rewrite weight functions.
* typo
* CUDA - fix cuda-gen collocated check
* make style.
Co-authored-by: jeremylt <thompson.jeremy.luke@gmail.com>
show more ...
|
| 777ff853 | 14-Aug-2020 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
QFunction Context Data Object (#596)
* Ctx - create context object for QFunction context data
* Context - rename UserContext -> QFunctionContext
* Ctx - add lcov markers
* Ctx - fix leak in
QFunction Context Data Object (#596)
* Ctx - create context object for QFunction context data
* Context - rename UserContext -> QFunctionContext
* Ctx - add lcov markers
* Ctx - fix leak in identity QFunctions
* Hip/Cuda - rename sync functions for vector/context
* Tests - lcov marker update
* QFunction - drop unused function
* Python - fix copy-paste errors
* Ctx - update notes for Fortran usage
* Fortran - drop unneeded cast
Co-authored-by: Jed Brown <jed@jedbrown.org>
* Interface - use void* for SetData interfaces
* Make - use call quiet for NVCC
* Interface - use void* for GetData interfaces
* Make - add quiet call option for examples
* Makefile - create common makefile to reduce duplication/complexity in example makefiles
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| d99fa3c5 | 28-Jul-2020 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Op - add interface for multigrid level creation (#579)
* Op - add interface for multigrid level creation
* Op - add implementation for OperatorMultigridLevelCreate
* make style
* make tidy
Op - add interface for multigrid level creation (#579)
* Op - add interface for multigrid level creation
* Op - add implementation for OperatorMultigridLevelCreate
* make style
* make tidy
* Op - add test t550, fix errors
* Tests - add Fortran version of t550
* Tests - add t511 for testing tensor basis multigrid level setup
* make style and tidy
* Tests - fix t55* memory leaks
* Tests - add t552 for non-tensor basis multigrid levels
* CUDA - use CeedIntMax in shared CUDA backend
* Tests - add OCCA test exception for t55*
* Op - add lvector global prolongation multiplicity, simplifies user interface
* Solids - convert example to new interface
* make style
* Tests - convert t550 to multicomponent
* Solids - drop unused ceed_fine
* Python - add new multigrid level interface
* Python - fix operator wrap, use ceed python obj rather than ceed pointer
* Gallery - update comment slightly
* Tests - remove accidental duplicate test
* Multigrid - add size=2 case as well
* Operator - drop unneeded inline
* QFunction - simplify context ownership to match vector
* make style
* Python - update multigrid function signature
* Operator - refactor prolong/restrict qfunctions as scaling qfunctions
* Vector - add testing for reciprocal and add to Fortran/Python interfaces
* CUDA - add VectorReciprocal on device
* Gallery - drop specalized versions for 'Scale', wil fix performance hit later
* Hip - add vector reciprocal
* Operator - add more flexible prologation basis creation interface
* Vec - make sure data is set for VectorReciprocal
* Tests - drop ncomp for t550/1 so kernel is not too large for Magma backend
* Tests - add missing lcov markers
* make style
* Travis - allow ARM job to fail
* Travis - fix intel install
* Travis - try different install dir name for inteloneapi
* Travis - add ifort, ipp packages
* Tests - add missing lcov marker
show more ...
|
| e9f4dca0 | 27-Jul-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
Cuda - add LCOV markers |
| af17f337 | 29-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
CUDA - reorder backend priority, lower is better |
| e75c1c2d | 29-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
make tidy |
| 1958eb7c | 26-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
CUDA - fix small leaks |
| 52d8ac88 | 25-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
CUDA - add missing codecov exceptions |
| 9525855c | 17-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
Ceed - add IsDeterministic |
| 0f70cdf6 | 25-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
CUDA - shared basis needs minimum of 1 elem per block |
| 7df94212 | 23-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
CUDA - clean up includes |
| 73b3ccaf | 23-Jun-2020 |
Jeremy L Thompson <thompson.jeremy.luke@gmail.com> |
CUDA - clean up minor memory leak |
| 49fd234c | 12-Jun-2020 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Convert CUDA ref/reg/shared E-Layout (#554)
* tests - update tests for multiple e-layouts
* CUDA - convert ref and reg restrictions to Q-layout
* CUDA - ref/reg/shared use gen/magma E-Layout f
Convert CUDA ref/reg/shared E-Layout (#554)
* tests - update tests for multiple e-layouts
* CUDA - convert ref and reg restrictions to Q-layout
* CUDA - ref/reg/shared use gen/magma E-Layout for multi elememnt basis apply and operator apply
* CUDA/MAGMA - drop eandqdiffer and separate MAGMA operator code
* CUDA - update operator comment
* reg - clarify read/write dofs/quads
* CUDA - drop dead code
show more ...
|
| ab213215 | 23-Apr-2020 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
More comments in CUDA backends (#518)
* CUDA - adding comments as I work to understand these backends
* PETSc - remove extra include, breaks single source
* make style |
| 621cd461 | 16-Mar-2020 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #421 from SanderA/sanderarens/fix_ceed_cuda_subclasses
Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen. |
| 5afe0718 | 23-Nov-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
CUDA - fix up composite operator delegation |
| abfaacbb | 17-Nov-2019 |
Sander Arens <sanderarens@gmail.com> |
Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.
Now Ceed_Cuda_ref/shared/gen act like subclasses and can be properly cast to Ceed_Cuda. |
| a7b7f929 | 16-Nov-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Basis - Use CEED_VECTOR_NONE for EVAL_MODE_WEIGHT |
| ccf0fe6f | 30-Oct-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
make style |
| cb0b5415 | 30-Oct-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Style - Fix indentation errors |
| 7f823360 | 16-Oct-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Make style |
| ac421f39 | 17-Sep-2019 |
Yohann <dudouit1@llnl.gov> |
Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libPar
Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
show more ...
|
| 288c0443 | 13-Sep-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
QFunction Create by Name (#311)
This PR adds a QFunction gallery to libCEED with 1D, 2D, and 3D mass and Poisson operators.
Closes issue #37, issue #340
* Add QFunction gallery, rename focca
QFunction Create by Name (#311)
This PR adds a QFunction gallery to libCEED with 1D, 2D, and 3D mass and Poisson operators.
Closes issue #37, issue #340
* Add QFunction gallery, rename focca
* Gallery - add initial QFunctions
* Add a test for using the QF gallery
* Modify ex1 to use gallery
* Add multiple test configs to tap
* Move output to test directory
* Update junit
* Add OCCA galley exception
* Add ex2
* Update ex2 for dim->ncompx
* Gallery - modify to work for CUDA as is
* Update Documentation
* Gallery - typo fix
* Gallery - convention change, postappend qfunction family variant
* Gallery - update template with new name checking convention
* Gallery - condense diff3DBuild QFunction
* Gallery - rename diff -> poisson
* Gallery - clarify poisson3DBuild comment
* Gallery - use Pragma SIMD, store Qdata in Voigt convention
* Examples - Convert BP3-6 to Voigt convention
* Examples - add cl option to switch between header and gallery qfs in CEED examples
* Examples - clean up construction of QF name
* Gallery - Switch to PascalCase for gallery names
* Doc - fix function type page
* Interface - Make sure strncpy result is null terminated
* Gallery - Update Poisson 2/3D Apply to new QF body
* make style
* make style - fix worst style problems
* make style - add gallery to make style
* Doc - update documentation errors and inconsistencies
* Examples - test ex1 ex2 with and without gallary
* Examples - reduce testing of ex1/ex2 without gallery, clean up non-gallery qfunctions
* MFEM - revert another make style mistake
* Manual make style updates
* Doc - update function documentation page
* Style updates, document test numbering conventions
* doc: resolve ambiguous image location warning, allow more Dot nodes
* Tests - style and cast cleanup
* Tests - fix README indentation
show more ...
|
| 4d537eea | 02-Sep-2019 |
Yohann <dudouit1@llnl.gov> |
Single Source QFunction (#304)
Introduce a new macro CEED_QFUNCTION that allows to define qFunctions in a single source code independently of the targeted backend.
Thanks-to: Jeremy Thompson
Tha
Single Source QFunction (#304)
Introduce a new macro CEED_QFUNCTION that allows to define qFunctions in a single source code independently of the targeted backend.
Thanks-to: Jeremy Thompson
Thanks-to: Jed Brown
This work is the result of a fruitful discussion between Jed Brown, Jeremy Thompson and Yohann Dudouit. Jeremy Thompson also implemented important features in this commit and was very active and helpful all along the progress of this work.
[NEWS] Breaking change: QFunctionField parameter 'ncomp' changed to 'size'. This change requires setting the previous value of 'ncomp' to 'ncomp*dim' when adding a QFunctionField with eval mode 'CEED_EVAL_GRAD'.
* First steps toward cuda-gen backend!
* Closer to real code generation.
* Generated code should be ready for nvrtc.
* The code generation skeleton is ready.
* Hack with the qfunction to make the operator kernel compile.
* Some tweaks in the makefile + Input fields structure change.
* Remove using cout.
* 1d interp and grad device functions.
* 1d readDofs, readQuads, writeDofs, writeQuads.
* Remove dead code.
* readDofs, readQuads, writeDofs, writeQuads for 2d and 3d
* 2d interp and grad
* 3d interp and grad
* - weight functions for 1d,2d,3d
- link the indices to the kernel
- link the fields to the kernel
- link the basis to the kernel
* Add the qFunction reader + inlining
* Add qf files for the tests.
* Add qf file for ceed/ex1
* Add qf file for mfem/bp1
* All tests pass.
* Add qFunction for mfem/bp3, petsc/bp1, and petsc/bp3.
* mfem/bp1 passes + remove dead code
* Fix a bug in n_quads_out for writeQuads
* mfem/bp3 passes.
* All tests all examples pass.
* Temporary tweaks for mfem benchmarking
* Add Context management.
* Modify .qf files to take into account the context.
* Enable optimizations.
* First set of optimization for 2D and 3D.
* double pointer format for the qFunction.
* Change the .qf files to have the same code as the C functions.
* Make previous Cuda backends use .qf files.
* Add a return value to qFunctions.
* Make cpu backends use .qf files.
* Minor: clean commented code.
* Add guarded math.h for petsc examples.
* Remove previous nek qf files.
* Remove .cu files.
* Remove .qf files.
* Remove dead code in the tests.
* make style
* Make style fix.
* more make style fixes.
* CEED_QFUNCTION - improve macro for CPU filenames
* Add CEED_QFUNCTION macro to navierstokes.c
* Fix PETSc gitignore
* Change default NS problemtype to density_current (#307) in navierstokes.c
* Fix petsc bp1.h
* Real Fix for petsc bp1.h...
* fix
* README - Add /gpu/cuda/gen
* PETSc - Update dmplex example to use *_loc
* cuda/reg - fix typo
* Revert a couple of small changes
* Fix a bug in mfem bp3 similar to the previous bug in petsc bp3.
* Make PETSc qfunctions look closer to master, and minor style for debugging.
* More uniformity changes
* Fix a strange CUDA_OUT_OF_RESSOURCE bug.
* NS - fix fname variables
* Use a different convention for qFunction ncomp.
* update cuda-gen backend and bpsdmplex.
* PETSc - style update
* update mfem bp1 and bp3.
* Interface - Use size instead of ncomp for QFunction fields
* update ceed example and tests.
* Tests - Update ncomp to size
* CPU Backends - Update ncomp to size
* CPU Backends - style
* Nek - Update ncomp to size
* Opt - fix style
* CUDA - update ncomp to size
* Doc - Update API documentation for QFunction \ncomp->size
* OCCA - Patch QFunction ncomp -> size, work but revamp will be better
* OCCA - assert dim>0 for clang-tidy
* CUDA - Change GetNumComp to GetSize
* Basis - Shift check for dim > 0 to interface
* Doc update
* Update NS field size
* NS - Fix problem options
show more ...
|