History log of /libCEED/backends/ (Results 901 – 925 of 1139)
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
f05116b914-Mar-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Drop array argument from backend RestoreArray (#210)

2f86a92013-Mar-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Add CeedTensorContract object (#211)

52d6035f13-Mar-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Operator Composition (#197)

* Composite Operator for cpu/self family of backends

* Remove small leak

* Improve C tests

* Add composite operator to Fortran interface and tests

* Fix Fortr

Operator Composition (#197)

* Composite Operator for cpu/self family of backends

* Remove small leak

* Improve C tests

* Add composite operator to Fortran interface and tests

* Fix Fortran test missing destroys

* Fortran test okl files, currently not used

* fix error in composite ' add' flag logic

* Switch composite op tests to f90

* Check for operator type on utility functions

* Documentation and test cleanup

* Make Style

show more ...

84a01de512-Mar-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Serial and Blocked AVX Backends (#198)

* Add serial AVX backend

* Style and README changes

* Simplify AVX serial tensor loop

* Minor performance improvement

* C=1 AVX scalar case

* In

Serial and Blocked AVX Backends (#198)

* Add serial AVX backend

* Style and README changes

* Simplify AVX serial tensor loop

* Minor performance improvement

* C=1 AVX scalar case

* Increase use of AVX commands for edge cases

* Prep for eventual Tensor Object

* Comment updates

* Readme update

* Update README

* Refactor to reduce code

* Increase vectorization in remainder of columns

* Vectorize column remainder on C=1 case

* Switch to static inlining for AVX tensor contract

* Tidying for merge

* make style

* Style cleanup

* Full register use for columns

* Make style

show more ...

cdf4f91809-Mar-2019 jeremylt <jeremy.thompson@colorado.edu>

Apply style changes

856142e106-Feb-2019 jeremylt <jeremy.thompson@colorado.edu>

Backend naming adjustment

4d1cd9fc06-Feb-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Add Nek to Travis (#169)

* Add test mode to Nek BP1 and BP3, improve Nek BPs

* Fix OCCA identity rst for multifield, minor NekBP1 fix

* Improve Nek run script

* Add Nek5K to prove-all

*

Add Nek to Travis (#169)

* Add test mode to Nek BP1 and BP3, improve Nek BPs

* Fix OCCA identity rst for multifield, minor NekBP1 fix

* Improve Nek run script

* Add Nek5K to prove-all

* Update travis yml for Nek5K

* Make style

* Adjust Travis yml

* Combine Nek run bash scripts

* Minor Nek script improvements

* Update to Nek 18.0 and reduce number of Nek compiler warnings

* Document required Nek5k version

* Remove stray command

* Remove extra file

* Adapt Nek for CUDA backend

* Fix Nek script string comparison

* Modify Nek script for better exit codes

* typo fix

* Modify the CU function names in nek/bp1.cu and nek/bp3.cu

* .cu file consistency

* Tidy Travis

* Tidy Travis

* Operator fixes

show more ...


/libCEED/.travis.yml
/libCEED/LICENSE
/libCEED/Makefile
/libCEED/README.md
blocked/ceed-blocked-basis.c
blocked/ceed-blocked-operator.c
occa/ceed-occa-operator.c
occa/ceed-occa-restrict.c
occa/ceed-occa-restrict.okl
occa/ceed-occa.h
ref/ceed-ref-operator.c
/libCEED/benchmarks/.gitignore
/libCEED/benchmarks/README.md
/libCEED/benchmarks/benchmark.sh
/libCEED/benchmarks/petsc-bp1.sh
/libCEED/benchmarks/petsc-bp3.sh
/libCEED/benchmarks/postprocess-base.py
/libCEED/benchmarks/postprocess-plot.py
/libCEED/benchmarks/postprocess-table.py
/libCEED/examples/Makefile
/libCEED/examples/ceed/ex1.cu
/libCEED/examples/mfem/bp1.cu
/libCEED/examples/mfem/bp3.cu
/libCEED/examples/nek5000/README.md
/libCEED/examples/nek5000/bp1.cu
/libCEED/examples/nek5000/bp1.okl
/libCEED/examples/nek5000/bp1.usr
/libCEED/examples/nek5000/bp3.cu
/libCEED/examples/nek5000/bp3.okl
/libCEED/examples/nek5000/bp3.usr
/libCEED/examples/nek5000/make-nek-examples.sh
/libCEED/examples/nek5000/make-nek-tests.sh
/libCEED/examples/nek5000/run-nek-example.sh
/libCEED/examples/petsc/Makefile
/libCEED/examples/petsc/bp1.c
/libCEED/examples/petsc/bp1.cu
/libCEED/examples/petsc/bp1.h
/libCEED/examples/petsc/bp3.c
/libCEED/examples/petsc/bp3.cu
/libCEED/examples/petsc/bp3.h
/libCEED/tests/t000-init-f.f90
/libCEED/tests/t100-vec-f.f90
/libCEED/tests/t101-vec-f.f90
/libCEED/tests/t102-vec-f.f90
/libCEED/tests/t103-vec-f.f90
/libCEED/tests/t104-vec-f.f90
/libCEED/tests/t105-vec-f.f90
/libCEED/tests/t106-vec-f.f90
/libCEED/tests/t107-vec-f.f90
/libCEED/tests/t108-vec-f.f90
/libCEED/tests/t200-elemrestriction-f.f90
/libCEED/tests/t201-elemrestriction-f.f90
/libCEED/tests/t202-elemrestriction-f.f90
/libCEED/tests/t203-elemrestriction-f.f90
/libCEED/tests/t204-elemrestriction-f.f90
/libCEED/tests/t205-elemrestriction-f.f90
/libCEED/tests/t206-elemrestriction-f.f90
/libCEED/tests/t207-elemrestriction-f.f90
/libCEED/tests/t300-basis-f.f90
/libCEED/tests/t301-basis-f.f90
/libCEED/tests/t302-basis-f.f90
/libCEED/tests/t303-basis-f.f90
/libCEED/tests/t304-basis-f.f90
/libCEED/tests/t305-basis-f.f90
/libCEED/tests/t306-basis-f.f90
/libCEED/tests/t307-basis-f.f90
/libCEED/tests/t310-basis-f.f90
/libCEED/tests/t311-basis-f.f90
/libCEED/tests/t312-basis-f.f90
/libCEED/tests/t313-basis-f.f90
/libCEED/tests/t400-qfunction-f.cu
/libCEED/tests/t400-qfunction-f.f90
/libCEED/tests/t400-qfunction.cu
/libCEED/tests/t401-qfunction-f.cu
/libCEED/tests/t401-qfunction-f.f90
/libCEED/tests/t401-qfunction.cu
/libCEED/tests/t500-operator-f.cu
/libCEED/tests/t500-operator-f.f90
/libCEED/tests/t500-operator.cu
/libCEED/tests/t501-operator-f.cu
/libCEED/tests/t501-operator-f.f90
/libCEED/tests/t501-operator.cu
/libCEED/tests/t502-operator-f.cu
/libCEED/tests/t502-operator-f.f90
/libCEED/tests/t502-operator.cu
/libCEED/tests/t510-operator-f.f90
/libCEED/tests/t511-operator-f.f90
/libCEED/tests/tap.sh
c286a8bf14-Jan-2019 jeremylt <jeremy.thompson@colorado.edu>

Switch libXSMM serial basis apply to Nek style

8d713cf620-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Initial libXSMM backend

9f0427d912-Jan-2019 Yohann <yohann.dudouit@gmail.com>

Cuda backend (#175)

Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant

Cuda backend (#175)

Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.

* Start cuda branch

* Start cuda branch

* Cuda backend works correctly for example 1

* More reliable operator destroy

* Fix cuda registration

* Makefile now works for cuda backend

* Start qfunction parallelization

* Remove extra cuda flags

* Cuda backend uses vector api instead of directly accessing internals

* Fix header from find and replace mistake

* Cuda qfunction callback working properly

* Cuda uses same integer pow function as other backends

* Use nvcc if available to support Cuda backend

* Remove extra memcpys from getting and restoring arrays

* MFEM examples work for cuda backend

* Optimized basis kernels to better utilize shared memory

* More kernel optimization

* Active/passive updates

* Make cuda kernels static to minimize external functions

* Fix cuda qfunction kernel loop condition

* Switch to NVRTC for cuda backend

* Add nelem argument to cuda basis apply

* First commit for the libParanumal backend

* Adds a function skeleton for the ceed-libparanumal-opearator.c

* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.

* Adds some guidelines for the implementation of the backend.

* Partially implement OperatorSetup for libparanumal.

- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private

* Adds the CeedQFunctionCreateInteriorFromGallery.

- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.

* Change the default fields for elliptic.

* Add setters, remove impl header from CPU, OCCA backends

* Add global NUM_BACKEND, fix qf user pointer getter

* Improve operator field frees

* Update MAGMA backend

* Use Occa Vectors in the libParanumal backend.

* Typo Fix

* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted

* Implements the new version of CeedQFunctionApply_Cuda.

* Update the Cuda backend to PR174.

* Bug fix in Cuda backend.

- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'

* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted

* Update MAGMA backend to vector inputs

* Modify restriction create in the cuda backend to handle memory correctly.

* Modify restriction destroy and apply of the cuda backend.

* Corrects a few typos in the cuda backend.

* Replace a CeedFree by a cudaFree...

* CeedVectorRestoreArrayRead was syncing unnecessarly data.

* CeedVectorRestoreArrayRead was syncing unnecessarly data.

* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.

* Adds an error check.

* Handles indice==NULL for identity restriction.

* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.

* Adds VectorRestor in BasisApply.

* Attempt to make SetValue function.

* Adds the memState variable inside the CeedVectorCuda and uses it.

* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......

* Some cleaning.

* Fix a logic error in VectorGetArray.

- Now allocates an array whatever the memState is

* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.

* Cleaning for PR to libCEED repo.

* Uses Setters instead of direct struct access.

* Use Getters instead of direct structure access.

* minor forgot to get ierr in after calling some functions.

* Forget to add the SetValue function in Cuda Vector...

* minor: Works even better if we give the right function to SetValue

* Fix: Set the right function for RestrictionBlocked...

* Replace some CeedChk with CeedChk_Cu

* Fix: Replace 'vec' by its length 'length'.

* Adds some CeedChk.

* Fix the Cuda_context_destroyed bug

* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...

* Use Occa file approach to read Cuda QFunctions.

* Fix a few bugs

* Test a new approach to pass the qFunction fields.

* Remove typo in t400.cu and remove debugging printf.

* Append the Cuda Fields struct at the beginning of each qFunction .cu file.

* Add qFunctions for t500, t501 and t502.

* Correct cu functions for t502.

* Memcpy the ctx on the device at each Apply call.

* Checks errors in VectorSync.

* Modifies a bit the memState logic.

* Adds a Cuda implementation of Operator instead of using Ref.

* Remove some unnecessary GetArray in OperatorApply.

* Does a trick for CEED_EVAL_NONE output.

* Fix a bug in CEED_EVAL_WEIGHT.

* Applies the QFunction to all elements, not only the first one...

* A debugging commit.

* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.

* Rewritten weight kernel.

* All C tests pass.

* Cleaning for PR.

* Remove unneeded commented code.

* Remove commented code.

* Remove the check on the pointer in RestoreArray.

* Fix a CeedFree bug.

* Fix the edata memory leak.

* Fix misuse of CeedFree.

* Allocate device memory if there is a magic context appearing due to Fortran.

* make style

* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.

* Remove a warning.

* Remove switch case fall-thourgh to remove warnings.

* Remive some bugs, make other bugs show up.

* Implement the Identity Restriction.

* Size correctly the restriction.

* Modify GPU restriction kernels instead of making dummy identity.

* Add cudaFree(0) before compiling to initialize the context (?!)

* Rewritten weight kernel.

* Fix typo in weight kernel.

* Fix typo in weight kernel.

* Add bp1.cu and bp3.cu for the petsc examples.

* Rewritten interp kernel for Cuda backend.

The interp kernel was not writting data in the layout that the
QFunction is expecting.

* Rewritten grad kernel for Cuda backend.

- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.

* Fix the logic in interp kernel.

* Fix the shared memory size.

* Modify grad kernel to take into account the libCEED data layout.

* Add a cuda file for mfem/bp3.

* Add synchronisation to mfem bp1 and bp3.

* Fix the grad and weight kernel to have the correct data layout.

* Forgotten cu files for Fortran.

* Corrects some typos in the Cuda file for petsc/bp1.

* Add Cuda files for the new t401 test.

* Update the logic on the transfer of the qFunction ctx.

* Write petsc/bp1 in C++ instead of C.

* Minor fix: typo

* Add synchronization to petsc/bp1+bp3.

* Removes the sync on rho in petsc/bp1+bp3.

* Integrate Jeremy Thompson's remarks to the PR.

* Use CeedError instead of exit(1).

* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.

* Removes synchronization on 'u' in the Apply.

* minor

* make style

* Use the new context interface.

* Minor

* Minor.

* Minor.

* Make style using align-pointer=name

* Minor: some cleaning

* CeedQFunctionUser: write documentation

* Make NVCC compatible with new OPT compiler options

show more ...

48fffa0617-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

avx vectorized backend

Edge cases for AVX BasisApply

Priority adjustment to match libXSMM branch

Remove scalar/simd mix for Intel

Check for CC AVX support

AVX: proposed doc and makefile detectio

avx vectorized backend

Edge cases for AVX BasisApply

Priority adjustment to match libXSMM branch

Remove scalar/simd mix for Intel

Check for CC AVX support

AVX: proposed doc and makefile detection update

show more ...

14922b2a07-Jan-2019 jeremylt <jeremy.thompson@colorado.edu>

Simplify accessing of inner context

069aeaba19-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Getters updated for fCtx values

418fb8c219-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Improve QFunction ctx for Fortran interface

d45ec61013-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Fix multifield output bug in QFApply_Ref

45918e5c12-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Make style

16c359e612-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Check state of input vectors for Blocked backend

91703d3f12-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Improve Ref/Blocked handling of operator vectors

16383ffc25-Nov-2018 jeremylt <jeremy.thompson@colorado.edu>

Update MAGMA backend to vector inputs

aedaa0e519-Nov-2018 jeremylt <jeremy.thompson@colorado.edu>

Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted

1dfeef1d12-Dec-2018 jeremylt <jeremy.thompson@colorado.edu>

Make style

74b949fc20-Nov-2018 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Typo Fix

5a51ea8216-Nov-2018 jeremylt <jeremy.thompson@colorado.edu>

Update MAGMA backend

28d161ee15-Nov-2018 jeremylt <jeremy.thompson@colorado.edu>

Add global NUM_BACKEND, fix qf user pointer getter

fe2413ff14-Nov-2018 jeremylt <jeremy.thompson@colorado.edu>

Add setters, remove impl header from CPU, OCCA backends

1...<<31323334353637383940>>...46