| /libCEED/doc/sphinx/source/ |
| H A D | intro.md | 8 This metric for computational efficiency made sense historically, when the performance was mostly l… 9 A more relevant performance plot for current state-of-the-art high-performance machines (for which … 21 Furthermore, software packages that provide high-performance implementations have often been specia… 22 … can unobtrusively be integrated in new and legacy software to provide performance portable interf…
|
| H A D | releasenotes.md | 117 - Various performance enhancements, analytic matrix-free and assembled Jacobian, and PETSc solver c… 188 …s to `CeedQFunctionContext` data as an optional feature to improve GPU performance. By default, ca… 214 ### Performance improvements 218 - Solid mechanics mini-app updated to explore the performance impacts of various formulations in th… 240 ### Performance improvements 243 - New HIP backends for improved tensor basis performance: `/gpu/hip/shared` and `/gpu/hip/gen`. 276 ### Performance improvements 278 - OCCA backend rebuilt to facilitate future performance enhancements. 319 ### Performance Improvements 321 - MAGMA backend performance optimization and non-tensor bases. [all …]
|
| H A D | libCEEDdev.md | 18 If there are no performance specific considerations, it is generally recommended to include a basic… 78 These backends use shared memory to improve performance for the {ref}`CeedBasis` kernels. 83 …o apply the action of the {ref}`CeedOperator`, significantly improving performance by eliminating … 87 These backends provide better performance for {ref}`CeedBasis` kernels but do not have the improvem…
|
| H A D | references.bib | 104 title = {Roofline: an insightful visual performance model for multicore architectures}, 128 …title = {On the Order of Accuracy and Numerical Performance of Two Classes of Finite Volume WE…
|
| /libCEED/doc/papers/joss/ |
| H A D | paper.md | 4 - high-performance computing 76 `libCEED` provides portable performance via run-time selection of implementations optimized for CPU… 81 …ations and discretization libraries, `libCEED` provides a platform for performance engineering and… 147 # Performance benchmarks 149 …performance of high-order finite element implementations [@Fischer2020scalability; @CEED-ECP-paper… 151 …Performance for BP3 using the \texttt{xsmm/blocked} backend on a 2-socket AMD EPYC 7452 (32-core, …
|
| H A D | paper.bib | 50 journal = {International Journal of High Performance Computing Applications}, 111 title = {{CEED ECP Milestone Report: Improve performance and 132 title={Scalability of high-performance PDE solvers}, 134 journal={The International Journal of High Performance Computing Applications}, 189 …title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython … 240 …performance of the interpreter is often a barrier when scaling to larger data sets. This paper pre… 397 title={Roofline: an insightful visual performance model for multicore architectures},
|
| /libCEED/julia/LibCEED.jl/docs/src/ |
| H A D | Misc.md | 5 performance, it is important to use specialized versions of these operations for 18 result in a type instability, and give poor performance.
|
| H A D | index.md | 111 The macro version can provide better performance if a closure is required, and
|
| /libCEED/ |
| H A D | README.md | 13 libCEED provides fast algebra for element-based discretizations, designed for performance portabili… 192 … `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance. 194 The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance. 201 …on the [LIBXSMM](https://github.com/libxsmm/libxsmm) package to provide vectorized CPU performance. 205 The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 207 The `/gpu/hip/*` backends provide GPU performance strictly using HIP. 211 The `/gpu/sycl/*` backends provide GPU performance strictly using SYCL. 227 …e libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance. 474 …title = {{H}igh-performance operator evaluations with ease of use: {libCEED}'s {P}ython interf…
|
| H A D | .gitlab-ci.yml | 45 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json 59 performance: performance.json 95 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json 136 performance: performance.json 343 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json 372 # performance: performance.json 452 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json 476 performance: performance.json
|
| H A D | setup.py | 68 sparse matrices, and can achieve very high performance on modern CPU and GPU
|
| H A D | CITATION.cff | 173 title: "High-performance operator evaluations with ease of use: libCEED's Python interface"
|
| /libCEED/examples/petsc/ |
| H A D | bps.c | 53 // Main body of program, called in a loop for performance benchmarking purposes 213 // First run's performance log is not considered for benchmarking purposes in RunWithDM() 231 // -- Performance logging in RunWithDM() 239 // -- Performance logging in RunWithDM() 262 PetscCall(PetscPrintf(rp->comm, " Performance:\n")); in RunWithDM()
|
| H A D | bpssphere.c | 243 // -- Performance logging in main() 252 // -- Performance logging in main() 275 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
|
| H A D | bpsswarm.c | 335 // -- Performance logging in main() 344 // -- Performance logging in main() 367 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
|
| H A D | multigrid.c | 463 // -- Performance logging in main() 472 // -- Performance logging in main() 502 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
|
| H A D | bpsraw.c | 703 // First run's performance log is not considered for benchmarking purposes in main() 721 // -- Performance logging in main() 730 // -- Performance logging in main() 753 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
|
| /libCEED/examples/solids/ |
| H A D | elasticity.c | 115 // Performance logging in main() 209 // Performance logging in main() 215 // Performance logging in main() 283 // Performance logging in main() 374 // Performance logging in main() 573 // Performance logging in main() 593 // Performance logging in main() 652 // Performance logging in main() 727 " Performance:\n" in main()
|
| /libCEED/julia/LibCEED.jl/src/ |
| H A D | CeedVector.jl | 266 Because of performance issues involving closures, if `f` is a complex operation, it may be 267 more efficient to use the macro version `@witharray` (cf. the section on "Performance of 269 documentation](https://docs.julialang.org/en/v1/manual/performance-tips) and related [GitHub
|
| /libCEED/benchmarks/ |
| H A D | README.md | 3 This directory contains benchmark problems for performance evaluation of libCEED
|
| /libCEED/doc/bib/ |
| H A D | references.bib | 48 …title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython …
|
| /libCEED/rust/libceed/ |
| H A D | README.md | 6 This crate provides an interface to [libCEED](https://libceed.org), which is a performance-portable…
|
| /libCEED/include/ceed/jit-source/magma/ |
| H A D | magma-basis-interp-deriv-nontensor.h | 34 …// unrolling this loop yields dramatic performance drop using hipcc, so let the compiler decide (n… in magma_basis_nontensor_device_n() 74 …// unrolling this loop yields dramatic performance drop using hipcc, so let the compiler decide (n… in magma_basis_nontensor_device_t() 120 …// unrolling this loop yields dramatic performance drop using hipcc, so let the compiler decide (n… in magma_basis_nontensor_device_ta()
|
| /libCEED/examples/ |
| H A D | README.md | 14 …retizations (CEED) uses Bakeoff Problems (BPs) to test and compare the performance of high-order f…
|
| /libCEED/backends/sycl-ref/ |
| H A D | ceed-sycl-ref-basis.sycl.cpp | 108 // Use older version of sycl workgroup barrier for performance reasons in CeedBasisApplyInterp_Sycl() 109 … // Can be updated in future to align with SYCL2020 spec if performance bottleneck is removed in CeedBasisApplyInterp_Sycl() 209 // Use older version of sycl workgroup barrier for performance reasons in CeedBasisApplyGrad_Sycl() 210 … // Can be updated in future to align with SYCL2020 spec if performance bottleneck is removed in CeedBasisApplyGrad_Sycl()
|