History log of /libCEED/backends/cuda-shared/ceed-cuda-shared-basis.c (Results 76 – 100 of 145)
Revision Date Author Comments
# 2b730f8b 17-Nov-2022 Jeremy L Thompson <jeremy@jeremylt.org>

Switch to clang-format (#1051)

* style - switch to clang-format

* ci - use newer libxsmm

* action - update format action

* format - consistent use of {} for multi-line if/for

* make - re

Switch to clang-format (#1051)

* style - switch to clang-format

* ci - use newer libxsmm

* action - update format action

* format - consistent use of {} for multi-line if/for

* make - remove stray newline

* make - simpler 'make format' target

* ci - use newer libxsmm

* doc - minor release note claification

* minor - minor fix

* minor - minor fix

* minor - minor fix

* minor - minor fix

* make format

* format - less aggressive alignment rules

* tidy - check for argument name mismatches

* fix newline

* format - mirror Ratel update to .clang-format

* fix merge error

* fix merge conflict

* fix merge error

* drop style in .phony list

* Update .clang-format

Co-authored-by: Jed Brown <jed@jedbrown.org>

* apply updated format

Co-authored-by: Jed Brown <jed@jedbrown.org>

show more ...


# 9e201c85 23-Sep-2022 Yohann <dudouit1@llnl.gov>

Refactor `cuda-gen` and `hip-gen` backends. (#1050)

* Add TODO items.

* rough, but something like this?

* wip - cleaning up some warnings, but more remain

* wip - reorganize

* wip - miss

Refactor `cuda-gen` and `hip-gen` backends. (#1050)

* Add TODO items.

* rough, but something like this?

* wip - cleaning up some warnings, but more remain

* wip - reorganize

* wip - missing kernels

* wip - replace t1d

* fix some kernels

* another typo

* more

* another one

* closer

* define T_1D

* typosgit add .!

* WIP: changes to cuda-shared framework for new kernels

* fix output writing

* buffer fix

* buffer sizes

* WIP: fixes for 2 and 3D basis kernels

* minor

* fix weight kernel for 3d

* remove debugging output

* minor reorg

* fix includes

* enable collo grad for cuda-shared

* move quoted kernels

* renaming

* missed a rename

* small fix

* more naming consistency

* faster 'useCollograd=false' path in *-gen

* more style

* one last style fix

* clearer collograd condition

* Add gen basis kernels to hip-shared

* Try some changes to hip-shared basis block sizes for new kernels

* cuda - drop extra kernel arg

* cuda - fix collograd check logic

* update gen comment about parallelization

* tidy up fields struct definition

* tidy up structs even more

* Update hip-gen basis templates use and move other hip-gen device functions to jit-source

* Finish hip-gen basis template update; small style updates to match CUDA

* missing isStrided

* Update block size used in 3D weight for new shared kernels

* update release notes

Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: nbeams <246972+nbeams@users.noreply.github.com>

show more ...


# 18562a3a 08-Apr-2022 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #935 from CEED/jeremy/install-folder

Install qf and kernels


# ee5a26f2 04-Apr-2022 Jeremy L Thompson <jeremy@jeremylt.org>

jit - add interface for adding additional jit source dirs


# a0154ade 04-Apr-2022 Jed Brown <jed@jedbrown.org>

move include/ceed-jit-source to include/ceed/jit-source


# 6eb0d8b4 01-Apr-2022 Jeremy L Thompson <jeremy@jeremylt.org>

jit - use relpath from include/ceed-jit-source for jit source files


# ce18bed9 17-Mar-2022 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #858 from CEED/jeremy/dump-copy-stuff

Strip redundant/outdated license info duplication


# 3d8e8822 17-Mar-2022 Jeremy L Thompson <jeremy@jeremylt.org>

minor - update copyright headers


# 60224bc5 14-Mar-2022 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #913 from CEED/jeremy/coo-ptrdiff

Create CeedSize as ptrdiff_t


# 1f9221fe 11-Mar-2022 Jeremy L Thompson <jeremy@jeremylt.org>

vec - use CeedSize for vector lengths


# e2cfdb03 18-Feb-2022 Jed Brown <jed@jedbrown.org>

Merge pull request #902 from CEED/jed/cuda-block-sizes

backends cuda-shared: fix launch bounds to avoid invalid z dimension


# e6f67ff7 18-Feb-2022 Jed Brown <jed@jedbrown.org>

backends cuda-shared: fix launch bounds to avoid invalid z dimension

The typical max z dimension size of a thread block is 64 and we were
computing larger values (like 85) in some cases.


# 8b036261 16-Feb-2022 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #899 from CEED/jed/cu-lv-cuda-11.6

CI: update lv for cuda-11.6


# c47bfe2b 16-Feb-2022 Jed Brown <jed@jedbrown.org>

backends/cuda-shared: limit 1D thread counts

We need to avoid this error:

CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106

A pro

backends/cuda-shared: limit 1D thread counts

We need to avoid this error:

CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106

A proper solution is to use cuOccupancyMaxPotentialBlockSize to place a
number of elements per block that stays within resource limits. This
would involve a bit more refactoring to do cleanly.

show more ...


# 51d630a3 24-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #864 from CEED/jeremy/gpu-templates

GPU - pull quoted kernels into separate files


# d7d111ec 23-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

gpu - style consistency


# 46dc0734 23-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

gpu - improved human-readability of debugging output


# 437930d1 22-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

gpu - pull quoted kernels into separate files


# d92fedf5 22-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #863 from CEED/jeremy/gpu-jit-code

GPU - separate common code into separate folder


# 6d69246a 21-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

cuda - separate compile functionality into new header


# 9c774edd 17-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

vec/qf - initial valid/borrowed/owned split for data (#853)

* vec/qf - initial valid/borrowed/owned split for data

* vec/qf - tidy logic for checking active/stale data

* minor - add missing NU

vec/qf - initial valid/borrowed/owned split for data (#853)

* vec/qf - initial valid/borrowed/owned split for data

* vec/qf - tidy logic for checking active/stale data

* minor - add missing NULL

* doc - explain VectorTakeArray update

* minor - update error messages

* test - update error message in junit/tap

* gpu - fix stray CeedScalar vs void for QFunctionContext

* vec/qf - clarify/simplify access logic

* vec - calloc host arrays when no value set to make empty

* style - minor

* style - minor

* minor - fix error messages

* vec/qf - move data validity checking to backend interface

* gpu - add missing sync error checking for qfcontext

* gpu - homogonize use of impl for backend data to reduce confusion

* vec - clarify access conditions

* python - update test for stricter vector access

* vec - minor fixes

* minor - fix ipython change

* vec - add missing declarations in ceed/backend.h

* ctx - mirror vector borrowed data check in ctx interface

* vec - add CeedVectorGetArrayWrite

* vec - consistent use of CeedVectorGetArray vs CeedVectorGetArrayWrite

* python - small vec fixes

* doc - describe vector data semantics

* magma - update restriction

* gpu - fix restr bug I added, need to sum into target

* magma - fix restriction bug

* cpu - fix restriction bug here too

* op - fix evec allocations

* julia - fix ElemRestriction for new vector access rules

* op - double check GetArray vs Read vs Write usage

* doc - small fix

* restr - clean up read/write logic for restr

* python - add vec.array_write

* magma - typo fix

show more ...


# 80a9ef05 02-Sep-2021 Natalie Beams <246972+nbeams@users.noreply.github.com>

Allow CeedScalar to be single precision (#788)

One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust,

Allow CeedScalar to be single precision (#788)

One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust, Julia, and Python. This interface is evolving and should be considered experimental at this time (thus lack of automated build support).

* Introduce CeedScalarType enum

* WIP changes to allow different definitions of CeedScalar

* Introduce new header files for float and double

* Only use avx tensor contract and MAGMA non-tensor basis if CeedScalar is double

* WIP changes to allow CeedScalar to be float

* WIP start trying to adjust test tolerances for float or double

* fix typos in comments

* install ceed-f32/64 headers

* Fix missing casts for hipMAGMA element restrictions

* make CeedQFunctionContextGetContextSize available for Python bindings

* Changes to Python bindings to allow CeedScalar to be float

* WIP adjust Python tests for float or double

* make style

* remove QFunctionContextGetContextSize from backend header

* Use quotes instead of <> in include statement

* Remove unncessary includes

* Update tolerances for tests

* [Julia] allow CeedScalar to be Float32

* [Julia] Use Preferences instead of custom build configuration

# Conflicts:
# julia/LibCEED.jl/src/C.jl

* [Makefile] Change definition of CC_VENDOR so it works with cross-compilation

* [Julia] Use Preferences in CI

# Conflicts:
# .github/workflows/julia-test-with-style.yml

* [Julia] Update docs about preferences

* [Julia] Add test/Project.toml workaround for Preferences

* Add CeedGetScalarType to get the type of CeedScalar at runtime

* [Julia] Move functions from Ceed.jl to LibCEED.jl

* [Julia] Add support for getting library path and scalar type at runtime

* [Julia] Minor change to checking if CUDA is loaded

* [Julia] Check correct CeedScalar types in basis functions

* [Julia] Fix tests comparing with output file

* [Julia] Change devtests to use CeedScalar instead of Float64

* Update test 402 so context will be same size in double or float

* Update tolerances for ceed examples

* [Julia] CUDA fixes

* remove unused variable in t208

* SchurDecomposition: do not compute tau on final iteration

* Update tolerances for some basis tests (for single precision)

* Make style

* Python style fixes for basis test

* Add single precision output for t300 and t320 and adjust checks; skip t541 in single

* Add LCOV exclusions after moving to new line

* fix spacing

* Python: make CEED_EPSILON available as libceed.EPSILON

* Python: optional parameter to specify different output file for test comparison

* Python: update tests' use of EPSILON and change test_300 output file for single precision

* Python: add convenience function for getting dtype corresponding to CeedScalar

* rust - add single precision support

* [Julia] Fall back on Float64 if CeedGetScalarType is not available

* [Julia] style

* Adjust tolerance for t301

* xsmm - add single precision support

* avx - add single precision support

* Add initial single precision support for MAGMA non-tensor basis

* Skip t300 and t320 in single precision; revert Python t300 changes

* Revert output changes for t300 and t320 in junit

* [Julia] Changes to autogenerated bindings for mixed precision

* [Julia] style

* [Julia] Check scalar type when changing libceed library path

The check is also performed when the package is loaded. This prevents having to
restart the Julia session twice

* [Julia] Require JLLWrappers version 1.3

This is needed to use Preferences to change the library path

* Add documentation page for precision development

Co-authored-by: Will Pazner <will.e.p@gmail.com>

* Cleanup from merge: remove old README

* Return CEED_ALIGN to backend.h

* Make Fortran compiler (FC) optional; empty skips Fortran tests

Use in Python and Rust builds, which may not have a Fortran compiler
installed and thus would produce confusing output.

* Add single precision CI test for Noether

Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>

Co-authored-by: Will Pazner <will.e.p@gmail.com>
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: Jed Brown <jed@jedbrown.org>

show more ...


# ebc204c0 15-Apr-2021 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #740 from CEED/natalie/device-id

Update device ID selection for HIP/CUDA/MAGMA backends


# 6dbfb411 05-Apr-2021 nbeams <246972+nbeams@users.noreply.github.com>

Update device ID selection for HIP/CUDA backends; add for MAGMA backends


# 874019bc 31-Mar-2021 Jed Brown <jed@jedbrown.org>

Merge pull request #716 from CEED/jed/install-backend.h

Jed/install backend.h


123456