Home
last modified time | relevance | path

Searched full:performance (Results 1 – 25 of 35) sorted by relevance

12

/libCEED/doc/sphinx/source/
H A Dintro.md8 This metric for computational efficiency made sense historically, when the performance was mostly l…
9 A more relevant performance plot for current state-of-the-art high-performance machines (for which …
21 Furthermore, software packages that provide high-performance implementations have often been specia…
22 … can unobtrusively be integrated in new and legacy software to provide performance portable interf…
H A Dreleasenotes.md117 - Various performance enhancements, analytic matrix-free and assembled Jacobian, and PETSc solver c…
188 …s to `CeedQFunctionContext` data as an optional feature to improve GPU performance. By default, ca…
214 ### Performance improvements
218 - Solid mechanics mini-app updated to explore the performance impacts of various formulations in th…
240 ### Performance improvements
243 - New HIP backends for improved tensor basis performance: `/gpu/hip/shared` and `/gpu/hip/gen`.
276 ### Performance improvements
278 - OCCA backend rebuilt to facilitate future performance enhancements.
319 ### Performance Improvements
321 - MAGMA backend performance optimization and non-tensor bases.
[all …]
H A DlibCEEDdev.md18 If there are no performance specific considerations, it is generally recommended to include a basic…
78 These backends use shared memory to improve performance for the {ref}`CeedBasis` kernels.
83 …o apply the action of the {ref}`CeedOperator`, significantly improving performance by eliminating …
87 These backends provide better performance for {ref}`CeedBasis` kernels but do not have the improvem…
H A Dreferences.bib104 title = {Roofline: an insightful visual performance model for multicore architectures},
128 …title = {On the Order of Accuracy and Numerical Performance of Two Classes of Finite Volume WE…
/libCEED/doc/papers/joss/
H A Dpaper.md4 - high-performance computing
76 `libCEED` provides portable performance via run-time selection of implementations optimized for CPU…
81 …ations and discretization libraries, `libCEED` provides a platform for performance engineering and…
147 # Performance benchmarks
149performance of high-order finite element implementations [@Fischer2020scalability; @CEED-ECP-paper…
151Performance for BP3 using the \texttt{xsmm/blocked} backend on a 2-socket AMD EPYC 7452 (32-core, …
H A Dpaper.bib50 journal = {International Journal of High Performance Computing Applications},
111 title = {{CEED ECP Milestone Report: Improve performance and
132 title={Scalability of high-performance PDE solvers},
134 journal={The International Journal of High Performance Computing Applications},
189 …title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython …
240performance of the interpreter is often a barrier when scaling to larger data sets. This paper pre…
397 title={Roofline: an insightful visual performance model for multicore architectures},
/libCEED/julia/LibCEED.jl/docs/src/
H A DMisc.md5 performance, it is important to use specialized versions of these operations for
18 result in a type instability, and give poor performance.
H A Dindex.md111 The macro version can provide better performance if a closure is required, and
/libCEED/
H A DREADME.md13 libCEED provides fast algebra for element-based discretizations, designed for performance portabili…
192 … `/cpu/self/opt/*` backends are written in pure C and use partial e-vectors to improve performance.
194 The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU performance.
201 …on the [LIBXSMM](https://github.com/libxsmm/libxsmm) package to provide vectorized CPU performance.
205 The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
207 The `/gpu/hip/*` backends provide GPU performance strictly using HIP.
211 The `/gpu/sycl/*` backends provide GPU performance strictly using SYCL.
227 …e libCEED backends use non-deterministic operations, such as `atomicAdd` for increased performance.
474 …title = {{H}igh-performance operator evaluations with ease of use: {libCEED}'s {P}ython interf…
H A D.gitlab-ci.yml45 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json
59 performance: performance.json
95 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json
136 performance: performance.json
343 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json
372 # performance: performance.json
452 …desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json
476 performance: performance.json
H A Dsetup.py68 sparse matrices, and can achieve very high performance on modern CPU and GPU
H A DCITATION.cff173 title: "High-performance operator evaluations with ease of use: libCEED's Python interface"
/libCEED/examples/petsc/
H A Dbps.c53 // Main body of program, called in a loop for performance benchmarking purposes
213 // First run's performance log is not considered for benchmarking purposes in RunWithDM()
231 // -- Performance logging in RunWithDM()
239 // -- Performance logging in RunWithDM()
262 PetscCall(PetscPrintf(rp->comm, " Performance:\n")); in RunWithDM()
H A Dbpssphere.c243 // -- Performance logging in main()
252 // -- Performance logging in main()
275 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
H A Dbpsswarm.c335 // -- Performance logging in main()
344 // -- Performance logging in main()
367 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
H A Dmultigrid.c463 // -- Performance logging in main()
472 // -- Performance logging in main()
502 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
H A Dbpsraw.c703 // First run's performance log is not considered for benchmarking purposes in main()
721 // -- Performance logging in main()
730 // -- Performance logging in main()
753 PetscCall(PetscPrintf(comm, " Performance:\n")); in main()
/libCEED/examples/solids/
H A Delasticity.c115 // Performance logging in main()
209 // Performance logging in main()
215 // Performance logging in main()
283 // Performance logging in main()
374 // Performance logging in main()
573 // Performance logging in main()
593 // Performance logging in main()
652 // Performance logging in main()
727 " Performance:\n" in main()
/libCEED/julia/LibCEED.jl/src/
H A DCeedVector.jl266 Because of performance issues involving closures, if `f` is a complex operation, it may be
267 more efficient to use the macro version `@witharray` (cf. the section on "Performance of
269 documentation](https://docs.julialang.org/en/v1/manual/performance-tips) and related [GitHub
/libCEED/benchmarks/
H A DREADME.md3 This directory contains benchmark problems for performance evaluation of libCEED
/libCEED/doc/bib/
H A Dreferences.bib48 …title = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython …
/libCEED/rust/libceed/
H A DREADME.md6 This crate provides an interface to [libCEED](https://libceed.org), which is a performance-portable…
/libCEED/include/ceed/jit-source/magma/
H A Dmagma-basis-interp-deriv-nontensor.h34 …// unrolling this loop yields dramatic performance drop using hipcc, so let the compiler decide (n… in magma_basis_nontensor_device_n()
74 …// unrolling this loop yields dramatic performance drop using hipcc, so let the compiler decide (n… in magma_basis_nontensor_device_t()
120 …// unrolling this loop yields dramatic performance drop using hipcc, so let the compiler decide (n… in magma_basis_nontensor_device_ta()
/libCEED/examples/
H A DREADME.md14 …retizations (CEED) uses Bakeoff Problems (BPs) to test and compare the performance of high-order f…
/libCEED/backends/sycl-ref/
H A Dceed-sycl-ref-basis.sycl.cpp108 // Use older version of sycl workgroup barrier for performance reasons in CeedBasisApplyInterp_Sycl()
109 … // Can be updated in future to align with SYCL2020 spec if performance bottleneck is removed in CeedBasisApplyInterp_Sycl()
209 // Use older version of sycl workgroup barrier for performance reasons in CeedBasisApplyGrad_Sycl()
210 … // Can be updated in future to align with SYCL2020 spec if performance bottleneck is removed in CeedBasisApplyGrad_Sycl()

12