| #
821dffb6
|
| 24-Mar-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
README typo fix
|
| #
55ae60f9
|
| 14-Mar-2019 |
Yohann <yohann.dudouit@gmail.com> |
Simple Cuda backend using one thread per element (#195)
Thanks-to: Jeremy Thompson
* Take into account the compute capability of the GPU
* Add the cuda/reg backend and rename cuda to cuda/ref.
Simple Cuda backend using one thread per element (#195)
Thanks-to: Jeremy Thompson
* Take into account the compute capability of the GPU
* Add the cuda/reg backend and rename cuda to cuda/ref.
- cuda/reg uses a simple approach where each element is
processed by one thread. This approach is expected to be
efficient for 1D and 2D problems, but very ineficient
as soon as the kernels start to spill, which should arise
around Q1D=4 for 3D problems.
* Compilation takes into account the deviceId
* Make style
* Remove dead code in cuda qFunctions.
* Cuda-reg specialized Restriction.
* Split the Prolongation operator into Identity/not Identity.
* Remove "#pragma unroll" until further perf investigation.
* README update
* Add a description of cuda/reg.
* Add CompositeOperator msg to CUDA backends
show more ...
|
| #
84a01de5
|
| 12-Mar-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Serial and Blocked AVX Backends (#198)
* Add serial AVX backend
* Style and README changes
* Simplify AVX serial tensor loop
* Minor performance improvement
* C=1 AVX scalar case
* In
Serial and Blocked AVX Backends (#198)
* Add serial AVX backend
* Style and README changes
* Simplify AVX serial tensor loop
* Minor performance improvement
* C=1 AVX scalar case
* Increase use of AVX commands for edge cases
* Prep for eventual Tensor Object
* Comment updates
* Readme update
* Update README
* Refactor to reduce code
* Increase vectorization in remainder of columns
* Vectorize column remainder on C=1 case
* Switch to static inlining for AVX tensor contract
* Tidying for merge
* make style
* Style cleanup
* Full register use for columns
* Make style
show more ...
|
| #
0a1d75a0
|
| 06-Feb-2019 |
Valeria Barra <39932030+valeriabarra@users.noreply.github.com> |
Merge pull request #206 from CEED/wording
Readability changes
|
| #
6b75b9c5
|
| 06-Feb-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
spelling
|
| #
293f4b1a
|
| 06-Feb-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Update README.md
|
| #
29d6e734
|
| 06-Feb-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Readme update
|
| #
4d1cd9fc
|
| 06-Feb-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Add Nek to Travis (#169)
* Add test mode to Nek BP1 and BP3, improve Nek BPs
* Fix OCCA identity rst for multifield, minor NekBP1 fix
* Improve Nek run script
* Add Nek5K to prove-all
*
Add Nek to Travis (#169)
* Add test mode to Nek BP1 and BP3, improve Nek BPs
* Fix OCCA identity rst for multifield, minor NekBP1 fix
* Improve Nek run script
* Add Nek5K to prove-all
* Update travis yml for Nek5K
* Make style
* Adjust Travis yml
* Combine Nek run bash scripts
* Minor Nek script improvements
* Update to Nek 18.0 and reduce number of Nek compiler warnings
* Document required Nek5k version
* Remove stray command
* Remove extra file
* Adapt Nek for CUDA backend
* Fix Nek script string comparison
* Modify Nek script for better exit codes
* typo fix
* Modify the CU function names in nek/bp1.cu and nek/bp3.cu
* .cu file consistency
* Tidy Travis
* Tidy Travis
* Operator fixes
show more ...
|
| #
0f918338
|
| 30-Jan-2019 |
Valeria Barra <39932030+valeriabarra@users.noreply.github.com> |
Merge pull request #202 from CEED/XSMM-fix
Fix LIBXSMM capitalization
|
| #
a0ecefdd
|
| 30-Jan-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Fix LIBXSMM capitalization
|
| #
2f4d9adb
|
| 26-Jan-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Benchmarking (#187)
* Add make benchmarks
* Various tweaks related to the benchmarks.
* In Makefile:
* target 'all' now builds the library, all tests and examples
* the old 'all' target is n
Benchmarking (#187)
* Add make benchmarks
* Various tweaks related to the benchmarks.
* In Makefile:
* target 'all' now builds the library, all tests and examples
* the old 'all' target is now called 'par'
* the target 'examples' will build also the MFEM and PETSc examples if
the respective library is available.
In the benchmarks/ directory:
* remove 'config.sh'
* cleanup unused stuff from 'benchmark.sh'.
* Fix postprocess scripts, convert to Python 3
* Small update in README.md
* Set benchmark cg its max, update gitignore
* Minor makefile fix
* In Makefile, add 'par' to the list of phony targets.
* In benchmarks/postprocess-table.py, sort the table by backend first.
* Small update in examples/petsc/Makefile - add a comment that
PETSC_ARCH can be undefined/empty, e.g. when using PETSc installed
through Spack.
* In Makefile, update the benchmarking targets:
* add separate targets for individual tests: `bench-petsc-bp1`,
`bench-petsc-bp3`, etc
* `make benchmarks` runs all defined benchmarks.
Update README.md to reflect the above changes.
show more ...
|
| #
f6a4878d
|
| 23-Jan-2019 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #186 from CEED/libxsmm
Initial libXSMM Backend
|
| #
8d713cf6
|
| 20-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Initial libXSMM backend
|
| #
ae228676
|
| 11-Jan-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #182 from CEED/avx
AVX Backend
|
| #
48fffa06
|
| 17-Dec-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
avx vectorized backend
Edge cases for AVX BasisApply
Priority adjustment to match libXSMM branch
Remove scalar/simd mix for Intel
Check for CC AVX support
AVX: proposed doc and makefile detectio
avx vectorized backend
Edge cases for AVX BasisApply
Priority adjustment to match libXSMM branch
Remove scalar/simd mix for Intel
Check for CC AVX support
AVX: proposed doc and makefile detection update
show more ...
|
| #
dba52a49
|
| 04-Sep-2018 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #147 from CEED/opt-to-vec
Rename /cpu/self/opt to /cpu/self/blocked
|
| #
4a2e7687
|
| 04-Sep-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Rename /cpu/self/opt to /cpu/self/blocked
|
| #
f82d2baa
|
| 15-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'cleanup' [PR #118]
* cleanup: make style, excluding backends/{occa,magma} make style: fix interface and include paths docs: fix capitalization doc: add developer notes on shape
Merge branch 'cleanup' [PR #118]
* cleanup: make style, excluding backends/{occa,magma} make style: fix interface and include paths docs: fix capitalization doc: add developer notes on shape and adopt convention Standardize CeedIntPow and CeedIntMin Move and document CeedIntMin, document CeedPowInt Add function levels Update Doxygen output naming Add Test List to Doxygen Doxygen interface comment updates Remove redundant doxygen comments Documentation updating for t500 Move ceed* files to 'inteface' directory, comment cleanup Further CPU backend commenting and cleaning Reorder tests, renumber for future expansion Clean up and tighten Opt and Ref backends
show more ...
|
| #
dfdf5a53
|
| 12-Aug-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Add function levels
|
| #
9ddbf157
|
| 09-Aug-2018 |
jeremylt <jeremy.thompson@colorado.edu> |
Documentation updating for t500
|
| #
583a6f96
|
| 07-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Add coverage badge
|
| #
b1b1662c
|
| 01-Aug-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'jed/makefile-optflags' [PR #102]
* jed/makefile-optflags: Makefile: use LINK.* for clearer output/less duplication Makefile: add OPT for all-language opt/dbg flags
|
| #
323c739c
|
| 24-Jul-2018 |
Jed Brown <jed@jedbrown.org> |
Makefile: add OPT for all-language opt/dbg flags
|
| #
389b3d93
|
| 19-Jul-2018 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'jed/active-passive' [PR #41]
* jed/active-passive: (58 commits) Remove spurious comments Make style [PETSc] Modify Makefile for abspath for .okl [OCCA] PETSc bp1 works, but .ok
Merge branch 'jed/active-passive' [PR #41]
* jed/active-passive: (58 commits) Remove spurious comments Make style [PETSc] Modify Makefile for abspath for .okl [OCCA] PETSc bp1 works, but .okl error in prove-all [OCCA] Fix qfunction not shifting output pointers [OCCA] Replacing series of 'if's with switch Modify Makefile to include ceed.pc for prove-all Fix error in Makefile checking for MFEM_DIR Update README.md Update Tmpl to use highest priority /cpu/self [OCCA] Rework switch statement for AllocOpOut and AllocOpIn PETSc bp1: update okl kernels and extract ComputeErrorMax Add CeedVectorGetLength Occa: sync to host for passive fields PETSc bp1: compute collocated error vector instead of reducing in kernel Occa: copy OperatorApply output to "used" pointer Add check for MFEM_DIR to Makefile [OCCA]Add zeroing of outvecs Further work on Nek5000 BPs, added error checking to OpApply [NEK][WIP] Modifying BPs ...
show more ...
|
| #
97a942b6
|
| 10-Jul-2018 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Update README.md
|