| #
2b730f8b
|
| 17-Nov-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Switch to clang-format (#1051)
* style - switch to clang-format
* ci - use newer libxsmm
* action - update format action
* format - consistent use of {} for multi-line if/for
* make - re
Switch to clang-format (#1051)
* style - switch to clang-format
* ci - use newer libxsmm
* action - update format action
* format - consistent use of {} for multi-line if/for
* make - remove stray newline
* make - simpler 'make format' target
* ci - use newer libxsmm
* doc - minor release note claification
* minor - minor fix
* minor - minor fix
* minor - minor fix
* minor - minor fix
* make format
* format - less aggressive alignment rules
* tidy - check for argument name mismatches
* fix newline
* format - mirror Ratel update to .clang-format
* fix merge error
* fix merge conflict
* fix merge error
* drop style in .phony list
* Update .clang-format
Co-authored-by: Jed Brown <jed@jedbrown.org>
* apply updated format
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| #
9e201c85
|
| 23-Sep-2022 |
Yohann <dudouit1@llnl.gov> |
Refactor `cuda-gen` and `hip-gen` backends. (#1050)
* Add TODO items.
* rough, but something like this?
* wip - cleaning up some warnings, but more remain
* wip - reorganize
* wip - miss
Refactor `cuda-gen` and `hip-gen` backends. (#1050)
* Add TODO items.
* rough, but something like this?
* wip - cleaning up some warnings, but more remain
* wip - reorganize
* wip - missing kernels
* wip - replace t1d
* fix some kernels
* another typo
* more
* another one
* closer
* define T_1D
* typosgit add .!
* WIP: changes to cuda-shared framework for new kernels
* fix output writing
* buffer fix
* buffer sizes
* WIP: fixes for 2 and 3D basis kernels
* minor
* fix weight kernel for 3d
* remove debugging output
* minor reorg
* fix includes
* enable collo grad for cuda-shared
* move quoted kernels
* renaming
* missed a rename
* small fix
* more naming consistency
* faster 'useCollograd=false' path in *-gen
* more style
* one last style fix
* clearer collograd condition
* Add gen basis kernels to hip-shared
* Try some changes to hip-shared basis block sizes for new kernels
* cuda - drop extra kernel arg
* cuda - fix collograd check logic
* update gen comment about parallelization
* tidy up fields struct definition
* tidy up structs even more
* Update hip-gen basis templates use and move other hip-gen device functions to jit-source
* Finish hip-gen basis template update; small style updates to match CUDA
* missing isStrided
* Update block size used in 3D weight for new shared kernels
* update release notes
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: nbeams <246972+nbeams@users.noreply.github.com>
show more ...
|
| #
18562a3a
|
| 08-Apr-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #935 from CEED/jeremy/install-folder
Install qf and kernels
|
| #
ee5a26f2
|
| 04-Apr-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
jit - add interface for adding additional jit source dirs
|
| #
a0154ade
|
| 04-Apr-2022 |
Jed Brown <jed@jedbrown.org> |
move include/ceed-jit-source to include/ceed/jit-source
|
| #
6eb0d8b4
|
| 01-Apr-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
jit - use relpath from include/ceed-jit-source for jit source files
|
| #
ce18bed9
|
| 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #858 from CEED/jeremy/dump-copy-stuff
Strip redundant/outdated license info duplication
|
| #
3d8e8822
|
| 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - update copyright headers
|
| #
60224bc5
|
| 14-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #913 from CEED/jeremy/coo-ptrdiff
Create CeedSize as ptrdiff_t
|
| #
1f9221fe
|
| 11-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
vec - use CeedSize for vector lengths
|
| #
e2cfdb03
|
| 18-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #902 from CEED/jed/cuda-block-sizes
backends cuda-shared: fix launch bounds to avoid invalid z dimension
|
| #
e6f67ff7
|
| 18-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends cuda-shared: fix launch bounds to avoid invalid z dimension
The typical max z dimension size of a thread block is 64 and we were computing larger values (like 85) in some cases.
|
| #
8b036261
|
| 16-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #899 from CEED/jed/cu-lv-cuda-11.6
CI: update lv for cuda-11.6
|
| #
c47bfe2b
|
| 16-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends/cuda-shared: limit 1D thread counts
We need to avoid this error:
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106
A pro
backends/cuda-shared: limit 1D thread counts
We need to avoid this error:
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106
A proper solution is to use cuOccupancyMaxPotentialBlockSize to place a number of elements per block that stays within resource limits. This would involve a bit more refactoring to do cleanly.
show more ...
|
| #
51d630a3
|
| 24-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #864 from CEED/jeremy/gpu-templates
GPU - pull quoted kernels into separate files
|
| #
d7d111ec
|
| 23-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - style consistency
|
| #
46dc0734
|
| 23-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - improved human-readability of debugging output
|
| #
437930d1
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - pull quoted kernels into separate files
|
| #
d92fedf5
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #863 from CEED/jeremy/gpu-jit-code
GPU - separate common code into separate folder
|
| #
6d69246a
|
| 21-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
cuda - separate compile functionality into new header
|
| #
9c774edd
|
| 17-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
vec/qf - initial valid/borrowed/owned split for data (#853)
* vec/qf - initial valid/borrowed/owned split for data
* vec/qf - tidy logic for checking active/stale data
* minor - add missing NU
vec/qf - initial valid/borrowed/owned split for data (#853)
* vec/qf - initial valid/borrowed/owned split for data
* vec/qf - tidy logic for checking active/stale data
* minor - add missing NULL
* doc - explain VectorTakeArray update
* minor - update error messages
* test - update error message in junit/tap
* gpu - fix stray CeedScalar vs void for QFunctionContext
* vec/qf - clarify/simplify access logic
* vec - calloc host arrays when no value set to make empty
* style - minor
* style - minor
* minor - fix error messages
* vec/qf - move data validity checking to backend interface
* gpu - add missing sync error checking for qfcontext
* gpu - homogonize use of impl for backend data to reduce confusion
* vec - clarify access conditions
* python - update test for stricter vector access
* vec - minor fixes
* minor - fix ipython change
* vec - add missing declarations in ceed/backend.h
* ctx - mirror vector borrowed data check in ctx interface
* vec - add CeedVectorGetArrayWrite
* vec - consistent use of CeedVectorGetArray vs CeedVectorGetArrayWrite
* python - small vec fixes
* doc - describe vector data semantics
* magma - update restriction
* gpu - fix restr bug I added, need to sum into target
* magma - fix restriction bug
* cpu - fix restriction bug here too
* op - fix evec allocations
* julia - fix ElemRestriction for new vector access rules
* op - double check GetArray vs Read vs Write usage
* doc - small fix
* restr - clean up read/write logic for restr
* python - add vec.array_write
* magma - typo fix
show more ...
|
| #
80a9ef05
|
| 02-Sep-2021 |
Natalie Beams <246972+nbeams@users.noreply.github.com> |
Allow CeedScalar to be single precision (#788)
One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust,
Allow CeedScalar to be single precision (#788)
One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust, Julia, and Python. This interface is evolving and should be considered experimental at this time (thus lack of automated build support).
* Introduce CeedScalarType enum
* WIP changes to allow different definitions of CeedScalar
* Introduce new header files for float and double
* Only use avx tensor contract and MAGMA non-tensor basis if CeedScalar is double
* WIP changes to allow CeedScalar to be float
* WIP start trying to adjust test tolerances for float or double
* fix typos in comments
* install ceed-f32/64 headers
* Fix missing casts for hipMAGMA element restrictions
* make CeedQFunctionContextGetContextSize available for Python bindings
* Changes to Python bindings to allow CeedScalar to be float
* WIP adjust Python tests for float or double
* make style
* remove QFunctionContextGetContextSize from backend header
* Use quotes instead of <> in include statement
* Remove unncessary includes
* Update tolerances for tests
* [Julia] allow CeedScalar to be Float32
* [Julia] Use Preferences instead of custom build configuration
# Conflicts:
# julia/LibCEED.jl/src/C.jl
* [Makefile] Change definition of CC_VENDOR so it works with cross-compilation
* [Julia] Use Preferences in CI
# Conflicts:
# .github/workflows/julia-test-with-style.yml
* [Julia] Update docs about preferences
* [Julia] Add test/Project.toml workaround for Preferences
* Add CeedGetScalarType to get the type of CeedScalar at runtime
* [Julia] Move functions from Ceed.jl to LibCEED.jl
* [Julia] Add support for getting library path and scalar type at runtime
* [Julia] Minor change to checking if CUDA is loaded
* [Julia] Check correct CeedScalar types in basis functions
* [Julia] Fix tests comparing with output file
* [Julia] Change devtests to use CeedScalar instead of Float64
* Update test 402 so context will be same size in double or float
* Update tolerances for ceed examples
* [Julia] CUDA fixes
* remove unused variable in t208
* SchurDecomposition: do not compute tau on final iteration
* Update tolerances for some basis tests (for single precision)
* Make style
* Python style fixes for basis test
* Add single precision output for t300 and t320 and adjust checks; skip t541 in single
* Add LCOV exclusions after moving to new line
* fix spacing
* Python: make CEED_EPSILON available as libceed.EPSILON
* Python: optional parameter to specify different output file for test comparison
* Python: update tests' use of EPSILON and change test_300 output file for single precision
* Python: add convenience function for getting dtype corresponding to CeedScalar
* rust - add single precision support
* [Julia] Fall back on Float64 if CeedGetScalarType is not available
* [Julia] style
* Adjust tolerance for t301
* xsmm - add single precision support
* avx - add single precision support
* Add initial single precision support for MAGMA non-tensor basis
* Skip t300 and t320 in single precision; revert Python t300 changes
* Revert output changes for t300 and t320 in junit
* [Julia] Changes to autogenerated bindings for mixed precision
* [Julia] style
* [Julia] Check scalar type when changing libceed library path
The check is also performed when the package is loaded. This prevents having to
restart the Julia session twice
* [Julia] Require JLLWrappers version 1.3
This is needed to use Preferences to change the library path
* Add documentation page for precision development
Co-authored-by: Will Pazner <will.e.p@gmail.com>
* Cleanup from merge: remove old README
* Return CEED_ALIGN to backend.h
* Make Fortran compiler (FC) optional; empty skips Fortran tests
Use in Python and Rust builds, which may not have a Fortran compiler
installed and thus would produce confusing output.
* Add single precision CI test for Noether
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: Will Pazner <will.e.p@gmail.com>
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| #
ebc204c0
|
| 15-Apr-2021 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #740 from CEED/natalie/device-id
Update device ID selection for HIP/CUDA/MAGMA backends
|
| #
6dbfb411
|
| 05-Apr-2021 |
nbeams <246972+nbeams@users.noreply.github.com> |
Update device ID selection for HIP/CUDA backends; add for MAGMA backends
|
| #
874019bc
|
| 31-Mar-2021 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #716 from CEED/jed/install-backend.h
Jed/install backend.h
|