| #
4e79ff5b
|
| 30-Jan-2019 |
Veselin Dobrev <dobrev@llnl.gov> |
In Makefile: replace '-' with '_' in a variable name.
|
| #
2f4d9adb
|
| 26-Jan-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Benchmarking (#187)
* Add make benchmarks
* Various tweaks related to the benchmarks.
* In Makefile:
* target 'all' now builds the library, all tests and examples
* the old 'all' target is n
Benchmarking (#187)
* Add make benchmarks
* Various tweaks related to the benchmarks.
* In Makefile:
* target 'all' now builds the library, all tests and examples
* the old 'all' target is now called 'par'
* the target 'examples' will build also the MFEM and PETSc examples if
the respective library is available.
In the benchmarks/ directory:
* remove 'config.sh'
* cleanup unused stuff from 'benchmark.sh'.
* Fix postprocess scripts, convert to Python 3
* Small update in README.md
* Set benchmark cg its max, update gitignore
* Minor makefile fix
* In Makefile, add 'par' to the list of phony targets.
* In benchmarks/postprocess-table.py, sort the table by backend first.
* Small update in examples/petsc/Makefile - add a comment that
PETSC_ARCH can be undefined/empty, e.g. when using PETSc installed
through Spack.
* In Makefile, update the benchmarking targets:
* add separate targets for individual tests: `bench-petsc-bp1`,
`bench-petsc-bp3`, etc
* `make benchmarks` runs all defined benchmarks.
Update README.md to reflect the above changes.
show more ...
|
| #
f6a4878d
|
| 23-Jan-2019 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #186 from CEED/libxsmm
Initial libXSMM Backend
|
| #
8d713cf6
|
| 20-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Initial libXSMM backend
|
| #
9f0427d9
|
| 12-Jan-2019 |
Yohann <yohann.dudouit@gmail.com> |
Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant
Cuda backend (#175)
Thanks-to: Steven Roberts
- for achieving most of the initial work, the code was well designed, clean, and pleasantly written.
Thanks-to: Jeremy Thompson
- for his constant support, exceptional patience, and the numerous relevant suggestions.
* Start cuda branch
* Start cuda branch
* Cuda backend works correctly for example 1
* More reliable operator destroy
* Fix cuda registration
* Makefile now works for cuda backend
* Start qfunction parallelization
* Remove extra cuda flags
* Cuda backend uses vector api instead of directly accessing internals
* Fix header from find and replace mistake
* Cuda qfunction callback working properly
* Cuda uses same integer pow function as other backends
* Use nvcc if available to support Cuda backend
* Remove extra memcpys from getting and restoring arrays
* MFEM examples work for cuda backend
* Optimized basis kernels to better utilize shared memory
* More kernel optimization
* Active/passive updates
* Make cuda kernels static to minimize external functions
* Fix cuda qfunction kernel loop condition
* Switch to NVRTC for cuda backend
* Add nelem argument to cuda basis apply
* First commit for the libParanumal backend
* Adds a function skeleton for the ceed-libparanumal-opearator.c
* Adds OperatorDestroy and OperatorSetupFields to the libParanumal backend.
* Adds some guidelines for the implementation of the backend.
* Partially implement OperatorSetup for libparanumal.
- The core of the OperatorSetup is written
- Adds a spec field to CeedQFunction_private
* Adds the CeedQFunctionCreateInteriorFromGallery.
- The gallery only contains a skeleton for "elliptic" for the moment.
- Comment some code unecessary for the moment.
* Change the default fields for elliptic.
* Add setters, remove impl header from CPU, OCCA backends
* Add global NUM_BACKEND, fix qf user pointer getter
* Improve operator field frees
* Update MAGMA backend
* Use Occa Vectors in the libParanumal backend.
* Typo Fix
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Implements the new version of CeedQFunctionApply_Cuda.
* Update the Cuda backend to PR174.
* Bug fix in Cuda backend.
- Replace sprintf by snprintf
- More careful use of the macro 'va_arg'
* Vector inputs for BasisApply and QFApply; CPU backends, OCCA, and tests converted
* Update MAGMA backend to vector inputs
* Modify restriction create in the cuda backend to handle memory correctly.
* Modify restriction destroy and apply of the cuda backend.
* Corrects a few typos in the cuda backend.
* Replace a CeedFree by a cudaFree...
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* CeedVectorRestoreArrayRead was syncing unnecessarly data.
* [FIX] Adds CeedVectorRestoreArray in the restriction of the cuda backend.
* Adds an error check.
* Handles indice==NULL for identity restriction.
* Adds an CeedElemRestrictionCreateBlocked_Cuda that errors.
* Adds VectorRestor in BasisApply.
* Attempt to make SetValue function.
* Adds the memState variable inside the CeedVectorCuda and uses it.
* Fix a bug that was passing the pointer instead of the address of
the pointer to CeedFree......
* Some cleaning.
* Fix a logic error in VectorGetArray.
- Now allocates an array whatever the memState is
* Fix: Basis apply checks if emode!=CEED_EVAL_WEIGHT before getting u array.
* Cleaning for PR to libCEED repo.
* Uses Setters instead of direct struct access.
* Use Getters instead of direct structure access.
* minor forgot to get ierr in after calling some functions.
* Forget to add the SetValue function in Cuda Vector...
* minor: Works even better if we give the right function to SetValue
* Fix: Set the right function for RestrictionBlocked...
* Replace some CeedChk with CeedChk_Cu
* Fix: Replace 'vec' by its length 'length'.
* Adds some CeedChk.
* Fix the Cuda_context_destroyed bug
* Adds error checking to cudaMemcpyH2D but not to D2H since it errors...
* Use Occa file approach to read Cuda QFunctions.
* Fix a few bugs
* Test a new approach to pass the qFunction fields.
* Remove typo in t400.cu and remove debugging printf.
* Append the Cuda Fields struct at the beginning of each qFunction .cu file.
* Add qFunctions for t500, t501 and t502.
* Correct cu functions for t502.
* Memcpy the ctx on the device at each Apply call.
* Checks errors in VectorSync.
* Modifies a bit the memState logic.
* Adds a Cuda implementation of Operator instead of using Ref.
* Remove some unnecessary GetArray in OperatorApply.
* Does a trick for CEED_EVAL_NONE output.
* Fix a bug in CEED_EVAL_WEIGHT.
* Applies the QFunction to all elements, not only the first one...
* A debugging commit.
* Fix: CEED_EVAL_WEIGHT use nelem in BasisApply_Cuda.
* Rewritten weight kernel.
* All C tests pass.
* Cleaning for PR.
* Remove unneeded commented code.
* Remove commented code.
* Remove the check on the pointer in RestoreArray.
* Fix a CeedFree bug.
* Fix the edata memory leak.
* Fix misuse of CeedFree.
* Allocate device memory if there is a magic context appearing due to Fortran.
* make style
* Adds cu files for petsc/bp1 mfem/bp1 and ceed/ex1.
* Remove a warning.
* Remove switch case fall-thourgh to remove warnings.
* Remive some bugs, make other bugs show up.
* Implement the Identity Restriction.
* Size correctly the restriction.
* Modify GPU restriction kernels instead of making dummy identity.
* Add cudaFree(0) before compiling to initialize the context (?!)
* Rewritten weight kernel.
* Fix typo in weight kernel.
* Fix typo in weight kernel.
* Add bp1.cu and bp3.cu for the petsc examples.
* Rewritten interp kernel for Cuda backend.
The interp kernel was not writting data in the layout that the
QFunction is expecting.
* Rewritten grad kernel for Cuda backend.
- Small fix on the interp kernel.
- The grad kernel was not writting data in the layout that the
QFunction is expecting.
* Fix the logic in interp kernel.
* Fix the shared memory size.
* Modify grad kernel to take into account the libCEED data layout.
* Add a cuda file for mfem/bp3.
* Add synchronisation to mfem bp1 and bp3.
* Fix the grad and weight kernel to have the correct data layout.
* Forgotten cu files for Fortran.
* Corrects some typos in the Cuda file for petsc/bp1.
* Add Cuda files for the new t401 test.
* Update the logic on the transfer of the qFunction ctx.
* Write petsc/bp1 in C++ instead of C.
* Minor fix: typo
* Add synchronization to petsc/bp1+bp3.
* Removes the sync on rho in petsc/bp1+bp3.
* Integrate Jeremy Thompson's remarks to the PR.
* Use CeedError instead of exit(1).
* Removes -lstdc++ and adds Ceed in front of DeviceSetValue function.
* Removes synchronization on 'u' in the Apply.
* minor
* make style
* Use the new context interface.
* Minor
* Minor.
* Minor.
* Make style using align-pointer=name
* Minor: some cleaning
* CeedQFunctionUser: write documentation
* Make NVCC compatible with new OPT compiler options
show more ...
|
| #
ae228676
|
| 11-Jan-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #182 from CEED/avx
AVX Backend
|
| #
48fffa06
|
| 17-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
avx vectorized backend
Edge cases for AVX BasisApply
Priority adjustment to match libXSMM branch
Remove scalar/simd mix for Intel
Check for CC AVX support
AVX: proposed doc and makefile detectio
avx vectorized backend
Edge cases for AVX BasisApply
Priority adjustment to match libXSMM branch
Remove scalar/simd mix for Intel
Check for CC AVX support
AVX: proposed doc and makefile detection update
show more ...
|
| #
2bc6258e
|
| 09-Jan-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #189 from CEED/fortran-tests-fform, close issue #84
Use -ffree-form for Fortran test suite
|
| #
10da579b
|
| 31-Dec-2018 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Use -ffree-form for Fortran test suite
|
| #
dba52a49
|
| 04-Sep-2018 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #147 from CEED/opt-to-vec
Rename /cpu/self/opt to /cpu/self/blocked
|
| #
4a2e7687
|
| 04-Sep-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Rename /cpu/self/opt to /cpu/self/blocked
|
| #
a7dfafed
|
| 29-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'new-basis-shapes' [PR #97]
* new-basis-shapes: Non-tensor bases
|
| #
a8de75f0
|
| 17-Aug-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Non-tensor bases
Add simplex integration test
Add simplex grad test
Style changes
Common header for t32* tests, reorder grad
Add t520 operator test with 2D simplex basis
Add t501 and t521 non-z
Non-tensor bases
Add simplex integration test
Add simplex grad test
Style changes
Common header for t32* tests, reorder grad
Add t520 operator test with 2D simplex basis
Add t501 and t521 non-zero operator tests
Adjust Fortran tests for clarity
Explicitly cast Fortran values as doubles in tests
Modify PR97 for new Fortran interface
Flaten CEED_TOPO to include dimension
Rebase PR 97 to new testing convention
Reorder ElemTopo to embed dimension bitwise, doc fix
Switch numbering convention, add GetTopologyDimension
Fortran headers for t31* and t51*, adjust PR97 for COLLOCATED typo
show more ...
|
| #
d554636c
|
| 21-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'veselin/makefile-info' [PR #136]
* veselin/makefile-info: Makefile: use pattern rule for MAGMA's multiple targets Make: use CURDIR in pattern rules to match generated *.d absolute
Merge branch 'veselin/makefile-info' [PR #136]
* veselin/makefile-info: Makefile: use pattern rule for MAGMA's multiple targets Make: use CURDIR in pattern rules to match generated *.d absolute paths Makefile: report backend status more compactly and precisely Makefile: cancel built-in and old-fashioned implicit rules Makefile: add .cu suffix; clean removes all of $(LIBDIR). Makefile: remove redundant generation of 'backends/magma/.DIR'. Makefile: remove some redundant output from 'make info'. Makefile: print the enabled backends only when building libceed. In Makefile, add 'info' target; always print enabled backends.
show more ...
|
| #
0c3c1e1f
|
| 21-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'jed/makefile-cleaning' into veselin/makefile-info
* jed/makefile-cleaning: Makefile: use pattern rule for MAGMA's multiple targets Make: use CURDIR in pattern rules to match genera
Merge branch 'jed/makefile-cleaning' into veselin/makefile-info
* jed/makefile-cleaning: Makefile: use pattern rule for MAGMA's multiple targets Make: use CURDIR in pattern rules to match generated *.d absolute paths Makefile: report backend status more compactly and precisely Makefile: cancel built-in and old-fashioned implicit rules
show more ...
|
| #
29715310
|
| 21-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Makefile: use pattern rule for MAGMA's multiple targets
Non-pattern rules do not support multiple targets built by a single recipe. Instead, they mean that each target can be built by running the r
Makefile: use pattern rule for MAGMA's multiple targets
Non-pattern rules do not support multiple targets built by a single recipe. Instead, they mean that each target can be built by running the recipe multiple times (the recipe has different $@ each time it is run). This is noisy and especially a problem with parallel make where simultaneous invocations could collide.
show more ...
|
| #
58e8d3b7
|
| 21-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Make: use CURDIR in pattern rules to match generated *.d absolute paths
This is benign for normal files because stat doesn't care if the path is relative or absolute, but if the *.c or *.cu source f
Make: use CURDIR in pattern rules to match generated *.d absolute paths
This is benign for normal files because stat doesn't care if the path is relative or absolute, but if the *.c or *.cu source file is generated, then it matters whether they match or not. Specifically, it can cause a pattern rule to either not match or to match twice (once as a relative path via explicit prerequisite in the makefile and again via the *.d depending on abspath).
show more ...
|
| #
d20f937d
|
| 21-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Makefile: report backend status more compactly and precisely
|
| #
da72e7fc
|
| 21-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Makefile: cancel built-in and old-fashioned implicit rules
|
| #
df0ef7e4
|
| 20-Aug-2018 |
Veselin Dobrev <dobrev@llnl.gov> |
Makefile: add .cu suffix; clean removes all of $(LIBDIR).
|
| #
9011e65c
|
| 20-Aug-2018 |
Veselin Dobrev <dobrev@llnl.gov> |
Makefile: remove redundant generation of 'backends/magma/.DIR'.
|
| #
f2fc93d4
|
| 20-Aug-2018 |
Veselin Dobrev <dobrev@llnl.gov> |
Makefile: remove some redundant output from 'make info'.
|
| #
23072ed2
|
| 20-Aug-2018 |
Veselin Dobrev <dobrev@llnl.gov> |
Makefile: print the enabled backends only when building libceed.
A few other small tweaks in the Makefile.
|
| #
bf3e26f6
|
| 19-Aug-2018 |
Veselin Dobrev <dobrev@llnl.gov> |
In Makefile, add 'info' target; always print enabled backends.
Also, make a few other small tweaks in the Makefile.
|
| #
1b95a3aa
|
| 15-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'jed/occa-skip-ocl' [PR #124]
* jed/occa-skip-ocl: test: remove /ocl/occa from BACKENDS
|