| 91cbf07c | 01-Feb-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Edit t200-f |
| 8d713cf6 | 20-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Initial libXSMM backend |
| 9f0427d9 | 12-Jan-2019 |
Yohann <yohann.dudouit@gmail.com> |
Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant
Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
show more ...
|
| a054982a | 09-Jan-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Set Fortran test file width to 80 |
| 10da579b | 31-Dec-2018 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Use -ffree-form for Fortran test suite |
| 1e35832b | 19-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Add t401 to test QFunction with context |
| aedaa0e5 | 19-Nov-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted |
| 1dfeef1d | 12-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Make style |
| 6f81c5e4 | 05-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Check for access lock on RestoreArray |
| 5b9c149c | 20-Nov-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Check for readers before setting array |
| 2cd729ee | 16-Nov-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
CeedVecs - Check for read access before granting write access |
| 73d26085 | 14-Nov-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
t204,205,206,207,502 restriction tests added, OCCA fix |
| 4dccadb6 | 30-Oct-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Add lmode field to CeedOperatorSetField |
| 0a5a520a | 06-Nov-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'getters' of github:ceed/libceed [PR #167]
* 'getters' of github:ceed/libceed: Improved documentation Add Operator/QFunction field getters Update documentation Separate to 3 hea
Merge branch 'getters' of github:ceed/libceed [PR #167]
* 'getters' of github:ceed/libceed: Improved documentation Add Operator/QFunction field getters Update documentation Separate to 3 header files First round of getters
[Remove unnecessary ceed-impl.h in merge.]
show more ...
|
| d863ab9b | 19-Oct-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Separate to 3 header files |
| aad944dc | 17-Sep-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Indent include statements in Fortran tests |
| fc140ed8 | 12-Sep-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Change grep to check for OCCA backend not supported |
| 4a2e7687 | 04-Sep-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Rename /cpu/self/opt to /cpu/self/blocked |
| a8de75f0 | 17-Aug-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Non-tensor bases
Add simplex integration test
Add simplex grad test
Style changes
Common header for t32* tests, reorder grad
Add t520 operator test with 2D simplex basis
Add t501 and t521 non-z
Non-tensor bases
Add simplex integration test
Add simplex grad test
Style changes
Common header for t32* tests, reorder grad
Add t520 operator test with 2D simplex basis
Add t501 and t521 non-zero operator tests
Adjust Fortran tests for clarity
Explicitly cast Fortran values as doubles in tests
Modify PR97 for new Fortran interface
Flaten CEED_TOPO to include dimension
Rebase PR 97 to new testing convention
Reorder ElemTopo to embed dimension bitwise, doc fix
Switch numbering convention, add GetTopologyDimension
Fortran headers for t31* and t51*, adjust PR97 for COLLOCATED typo
show more ...
|
| 1adffcc9 | 28-Aug-2018 |
Valeria Barra <valeria.barra@colorado.edu> |
fixed other occurrences of 'colocated' |
| 783c99b3 | 28-Aug-2018 |
Valeria Barra <valeria.barra@colorado.edu> |
Refactored 'colocated' misspelling |
| 4cc6aec3 | 26-Aug-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Add testing whitelist
Explicitly check for t103 or t104 for expected fail |
| c1e35e21 | 26-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'occa-vec' [PR #99]
* occa-vec: Remove large stack allocation Fix occa vector memory management |
| 31d78078 | 17-Aug-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Fix t304 |
| 6456524e | 15-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
fortran: add offset argument to ceedvectorrestorearray, use in Nek examples
This allows the offset integers to be zeroed, thus preventing potentially unpredictable behavior if that value is accident
fortran: add offset argument to ceedvectorrestorearray, use in Nek examples
This allows the offset integers to be zeroed, thus preventing potentially unpredictable behavior if that value is accidentally used.
show more ...
|