| 49aac155 | 24-Mar-2023 |
Jeremy L Thompson <jeremy@jeremylt.org> |
IWYU fixes (#1182)
* iwyu - include fixes
* iwyu - silence some iwyu output
* minor - clearer macro names
* iwyu - fix suggestion of "ceed/ceed.h" externally
* iwyu - lighter petsc heade
IWYU fixes (#1182)
* iwyu - include fixes
* iwyu - silence some iwyu output
* minor - clearer macro names
* iwyu - fix suggestion of "ceed/ceed.h" externally
* iwyu - lighter petsc headers
* iwyu - ceed/ceed.h -> ceed.h
* iwyu - cuda/hip include fixes
show more ...
|
| 131837e7 | 14-Mar-2023 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #1172 from CEED/jeremy/more-tests
Spelling and Ceed example consistency |
| 6fb6c846 | 08-Mar-2023 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - add missing backend impl method link |
| 5fb68f37 | 08-Mar-2023 |
Karen (Ren) Stengel <karenlstengel@gmail.com> |
Adding CeedVectorCopy() and CeedVectorAXPBY() (#1170)
* adding CeedVectorAXPBY and CeedVectorCopy functions with Rust, Python, and CUDA/HIP backend support
---------
Co-authored-by: James Wrig
Adding CeedVectorCopy() and CeedVectorAXPBY() (#1170)
* adding CeedVectorAXPBY and CeedVectorCopy functions with Rust, Python, and CUDA/HIP backend support
---------
Co-authored-by: James Wright <james@jameswright.xyz>
Co-authored-by: Adeleke O. Bankole <86932837+AdelekeBankole@users.noreply.github.com>
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| ecc88aeb | 06-Mar-2023 |
Jeremy L Thompson <jeremy@jeremylt.org> |
spelling - spelling fixes in core |
| 28ec399d | 16-Feb-2023 |
Jeremy L Thompson <jeremy@jeremylt.org> |
pc - fix read/write access logic for full assembly |
| 9798701e | 15-Feb-2023 |
Jeremy L Thompson <jeremy@jeremylt.org> |
memcheck - add vector read/write checking |
| 023b8a51 | 25-Jan-2023 |
abdelfattah83 <36712794+abdelfattah83@users.noreply.github.com> |
magma: non-tensor rtc (#1141)
* some refactoring in magma's jit src
* fix path
* fix loading src
* refactor magma nontensor backend
* refactor magma nontensor backend
* [WIP]: new non
magma: non-tensor rtc (#1141)
* some refactoring in magma's jit src
* fix path
* fix loading src
* refactor magma nontensor backend
* refactor magma nontensor backend
* [WIP]: new nontensor basis kernels
* [WIP]: new nontensor basis kernels
* [WIP]: new nontensor basis kernels
* call the new nontensor kernels for low order problems
* multiple compilation for the same kernels but with different tuning parmaters
* magma: allow different nb's for different non-tensor kernels
* tuning data for the non-tensor rtc kernels
* remove no-longer used functions, add new one for tuning the nontensor kernels
* constants for tuning
* tuning functions
* use the tuning functions in compiling/running the new kernels
* bug fix
* fixes
* fixes
* minor
* switch tuning data
* fix name
* fix name
* add function to run cuda kernels with opt-in shared memory feature
* minor fix
* minor fix
* fix calls to batch api
* allow more kernel instances
* temporary timing function
* temporary timing function
* tuning data based on hiprtc
* rollback tuning parameters
* fixes
* fixes
* fix inconsistency in the parameters passed to nvrtc/hiprtc
* minor
* a fix to the nb selector
* cleanup
* merge the opt-in feature in CeedRunKernelDimSharedOptinCuda into CeedRunKernelDimSharedCuda
* fix paths for hip-magma backends
* style
* fixes
* running make format
* undo changes from the last commit
* change HIP_DIR to ROCM_DIR and adjust the paths for magma accordingly
* replace HIP_DIR with ROCM_DIR
show more ...
|
| ea61e9ac | 30-Nov-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - assorted formatting fixes |
| a915a514 | 17-Nov-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
Updated CEED_EVAL_DIV in ref/opt/blocked backends |
| 2b730f8b | 17-Nov-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Switch to clang-format (#1051)
* style - switch to clang-format
* ci - use newer libxsmm
* action - update format action
* format - consistent use of {} for multi-line if/for
* make - re
Switch to clang-format (#1051)
* style - switch to clang-format
* ci - use newer libxsmm
* action - update format action
* format - consistent use of {} for multi-line if/for
* make - remove stray newline
* make - simpler 'make format' target
* ci - use newer libxsmm
* doc - minor release note claification
* minor - minor fix
* minor - minor fix
* minor - minor fix
* minor - minor fix
* make format
* format - less aggressive alignment rules
* tidy - check for argument name mismatches
* fix newline
* format - mirror Ratel update to .clang-format
* fix merge error
* fix merge conflict
* fix merge error
* drop style in .phony list
* Update .clang-format
Co-authored-by: Jed Brown <jed@jedbrown.org>
* apply updated format
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| fa8c78e2 | 14-Nov-2022 |
Zach Atkins <zacharyjayhawk@gmail.com> |
Update error for multi-field diagonal assembly (#1089)
* Update error for multi-field diagonal assembly
* update error message for multi-field non-composite operator assembly |
| 204bfdd7 | 03-Nov-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - clearer GPU QF/Op kernel names |
| 0be03a92 | 13-Oct-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Update OCCA Backend (#1072)
* Update OCCA memory interop call.
* Removes deprecated kernelBuilder calls; builds directly from the device.
* Uses `std::memset` and `std:memcpy`.
* Uses `std:
Update OCCA Backend (#1072)
* Update OCCA memory interop call.
* Removes deprecated kernelBuilder calls; builds directly from the device.
* Uses `std::memset` and `std:memcpy`.
* Uses `std::to_string` instead of internal `occa::toString`.
* Uses `std::memcpy`.
* Sets kernel properties.
* Removes deprecated kernelBuilder calls; builds directly from the device.
* Removes deprecated kernelBuild call; builds directly from the device.
* Uses `std::memcpy`.
* Removes deprecated calls to `occa::linalg`.
* Add registration and device configuration for DPC++ and OpenCL backends.
* Add DPC++ and OpenCL backends to makefile.
* Spelling.
* Configure build flags for oneAPI compilers.
* Correctly set mode in `occa::json` object.
* Add missing functions to OCCA CeedVector implementation.
* Adds missing call to `setValueKernel` in OCCA CeedVector impl.
* Gets occa device function name from the CeedQFunction.
* Adds OpenCL and DPC++ to backends list.
* Uses unique kernel name for OCCA qFunction kernels.
* Adds a dummy `ceed.h` header in include in OCCA kernels.
* Rewrite arrays of structs in format that OCCA can handle.
* Adds stubs for missing functions in OCCA qfunctioncontext.
* Includes the cmath header when compiling C++ code.
* Add stubs for missing OCCA backend LinearAssembleXXX functions.
* Adds missing functions to OCCA implemenation of qFunctionContext.
* Removes math function headers which were causing OCCA JIT failures.
* Rewrite arrays of structs in format that OCCA can handle.
* Rewrites fluids example qfunctions to be compatible with OCCA.
* Fixes array dimensions in mass2dbuild.
* Rewrites advection problem kernels to work with OCCA.
* Rewrites blasius problem kernels to work with OCCA.
* Rewrites channel problem kernels to work with OCCA.
* Rewrites dirichlet bc kernels to work with OCCA.
* Rewrites newtonian kernels to work with OCCA.
* Rewrites setupgeo kernels to be compatible with OCCA.
* Rewrites stabilization kernels to be compatible with OCCA.
* Rewrites stg kernels to be compatible with OCCA.
* Adds occa backends to tests for the fluids example.
* doc - update OCCA info in release notes + README
* ci - run with OCCA v1.4
* occa - update copyright boilerplate
* occa - drop unused define
* ci - fix OCCA install
* wip
* ci - fix occa skip list
* make/ci - fix use of OCCA_DIR/bin/occa
* makefile - minor style
* ci - re-enable OCCA dir caching
* doc - update release notes to mention OCCA QF workarounds
Co-authored-by: Kris Rowe <kris.rowe@anl.gov>
show more ...
|
| 9e201c85 | 23-Sep-2022 |
Yohann <dudouit1@llnl.gov> |
Refactor `cuda-gen` and `hip-gen` backends. (#1050)
* Add TODO items.
* rough, but something like this?
* wip - cleaning up some warnings, but more remain
* wip - reorganize
* wip - miss
Refactor `cuda-gen` and `hip-gen` backends. (#1050)
* Add TODO items.
* rough, but something like this?
* wip - cleaning up some warnings, but more remain
* wip - reorganize
* wip - missing kernels
* wip - replace t1d
* fix some kernels
* another typo
* more
* another one
* closer
* define T_1D
* typosgit add .!
* WIP: changes to cuda-shared framework for new kernels
* fix output writing
* buffer fix
* buffer sizes
* WIP: fixes for 2 and 3D basis kernels
* minor
* fix weight kernel for 3d
* remove debugging output
* minor reorg
* fix includes
* enable collo grad for cuda-shared
* move quoted kernels
* renaming
* missed a rename
* small fix
* more naming consistency
* faster 'useCollograd=false' path in *-gen
* more style
* one last style fix
* clearer collograd condition
* Add gen basis kernels to hip-shared
* Try some changes to hip-shared basis block sizes for new kernels
* cuda - drop extra kernel arg
* cuda - fix collograd check logic
* update gen comment about parallelization
* tidy up fields struct definition
* tidy up structs even more
* Update hip-gen basis templates use and move other hip-gen device functions to jit-source
* Finish hip-gen basis template update; small style updates to match CUDA
* missing isStrided
* Update block size used in 3D weight for new shared kernels
* update release notes
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: nbeams <246972+nbeams@users.noreply.github.com>
show more ...
|
| dc64899e | 06-Sep-2022 |
Yohann <dudouit1@llnl.gov> |
Change the initialization logic for `useCollograd`. (#1021)
* Change the initialization logic for `useCollograd`.
* Guard useCollograd for 3D only.
* Propagate `useCollograd` change to `hip-ge
Change the initialization logic for `useCollograd`. (#1021)
* Change the initialization logic for `useCollograd`.
* Guard useCollograd for 3D only.
* Propagate `useCollograd` change to `hip-gen`.
* Update backends/hip-gen/ceed-hip-gen-operator-build.cpp
* Propagate changes to `hig-gen`.
* Revert redimensioning of `r_tt`.
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| 2dc3fb5f | 31-Aug-2022 |
abdelfattah83 <36712794+abdelfattah83@users.noreply.github.com> |
Icl/magma ntgemm (#1060)
* tuning data and driver for the non-tensor gemm
* header
* update magma non-tensor sgemm/dgemm to use the gemm selector
* add cpp files for the magma backend
*
Icl/magma ntgemm (#1060)
* tuning data and driver for the non-tensor gemm
* header
* update magma non-tensor sgemm/dgemm to use the gemm selector
* add cpp files for the magma backend
* minor fix
* define CEED_INTERN for every function instead of a block definition
* include tuning data for CUDA or HIP only
* recent tuning data for a100 and mi250x
* style
* remove unused declarations
* expand tuning data for v100 and mi100
* switch to std array instead of std vector for individual records
* choose between gfx90a and gfx908 for HIP
* bug fix: choose between magma and vendor blas in non-batch mode
* style
show more ...
|
| 01005eab | 30-Aug-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #1053 from CEED/natalie/fix-magma-jit-mem
Fix small memory leaks in JIT source code management |
| 2ba3f748 | 27-Aug-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
Freed orient's array |
| 03f90b05 | 26-Aug-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
magma: free memory used in loading jit kernel source |
| 0df8cb37 | 16-Aug-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - guard hipblas header include for HIP_VERSION |
| c9c2c079 | 05-Aug-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
QF headers for typedefs and macros (#1036)
* jit - qf headers for typedefs and macros
* jit - smaller list of permitted files
* ceed - only include ceed.h in QF source |
| 6d1815bb | 07-Jul-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
memcheck - add missing CeedChk |
| e8001fe0 | 07-Jul-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #1009 from CEED/jrwrigh/dirichlet_with_libceed
Fluids - Use libCEED to compute Dirichlet boundary conditions |
| 3b0d37b7 | 07-Jul-2022 |
Jed Brown <jed@jedbrown.org> |
{cuda,hip}/gen: fix incorrect quadrature points when all bases are collocated
https://github.com/CEED/libCEED/pull/1009#issuecomment-1176751436
Co-authored-by: Natalie Beams <nbeams@icl.utk.edu> |