| /libCEED/examples/mfem/ |
| H A D | bp1.hpp | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| H A D | bp3.hpp | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| /libCEED/backends/magma/ |
| H A D | ceed-magma.c | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| /libCEED/interface/ |
| H A D | ceed-fortran.c | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| H A D | ceed-qfunction.c | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| H A D | ceed.c | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| /libCEED/include/ |
| H A D | ceed.h | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| H A D | ceed-impl.h | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|
| /libCEED/ |
| H A D | Makefile | diff 9f0427d99e9674f1e08f64878fc1ceefe3e53022 Sat Jan 12 00:19:38 UTC 2019 Yohann <yohann.dudouit@gmail.com> Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
|