Change *.cpp to *.cxx
Fix the inconsistent usage of #if [!]defined XXX compared to defined(XXX)Thanks-to: Pierre JolivetAn LLM Claude detected the incosnsitency it was not used to remove the inconsistency
MPI: update the GPU-awareness check
Add the logging of gpu energy- Remove unneeded PETSC_HAVE_DEVICE macro- -log_view_gpu_energy requires CUDA version >= 12.2- Use PetscDefined instead of macro
checkbadSource: enforce proper style in makefiles
Fix CUDA 13 API incompatibilitiesCo-authored-by: Satish Balay <balay@mcs.anl.gov>
missing "s" for isascii and issundials
Merge remote-tracking branch 'origin/release'
petscerror.h: iwyu export semi-private headerpetscerror.h cannot be included directly unless petscsys.h has alreadybeen included (and that unconditionally includes petscerror.h). Thisupdates docs
petscerror.h: iwyu export semi-private headerpetscerror.h cannot be included directly unless petscsys.h has alreadybeen included (and that unconditionally includes petscerror.h). Thisupdates docs and uses IWYU pragma export so IDEs (clangd) willautomatically include the correct header when you use things likeSETERRQ that appear petscerror.h.Fix #1254
show more ...
CUPM: Fix some mult routines and make some small performance improvementsVecSeq_CUPM: Fix ::Dot, ::TDot, ::WAXPYAsync, and ::AXPBYAsync to work with non-device vectorsMat_SeqDenseCUPM: fix ::SetR
CUPM: Fix some mult routines and make some small performance improvementsVecSeq_CUPM: Fix ::Dot, ::TDot, ::WAXPYAsync, and ::AXPBYAsync to work with non-device vectorsMat_SeqDenseCUPM: fix ::SetRandom for hipMat_SeqDenseCUPM: implement conjugate on the deviceMat_SeqDenseCUPM: Fix MatMult() (and friends) when the vector is not on the deviceThis implementation assumes that the cost of moving the matrix to thehost is more expensive that moving the vector to the device, so temporarydevice copies of the host vectors are used.VecCUPM: Avoid device synchronization in some cases of ResetArray()The documentation says that the PetscDeviceSynchronization() isonly needed if CopyToDevice_() resulted in a HtoD memcpy,which would only happen if v->offloadmask == PETSC_OFFLOAD_CPU.If we test this condition, we can avoid the synchronization.This improves the peformance of MatDenseRestoreColumnVecWrite() inperformance critical loops.CUPM: use thrust::hip::par_nosync
Check for env variable TORCHELASTIC_RUN_ID to "ensure" LOCAL_RANK env variable is truly associted with PyTorch.Reported-by: Stefano Zampini
Set proper defaults for GPU devices when running under PytTorch/niccl.Use getenv("LOCAL_RANK")Reported-by: Hong Zhang
Add to CI compilers flags '-Wconversion', '-Wno-sign-conversion', '-Wno-float-conversion', '-Wno-implicit-float-conversion']Also fix the code to repository to compile cleanly with these flags in th
Add to CI compilers flags '-Wconversion', '-Wno-sign-conversion', '-Wno-float-conversion', '-Wno-implicit-float-conversion']Also fix the code to repository to compile cleanly with these flags in the CI
Config: get rid of PETSC_HAVE_OMPI_MAJOR_VERSION and include it in petscpkg_version.h
cupm: fix visibility to build without warningsThanks-to: Lawrence Mitchell
SYS: fix typos in printf with 64-bit indices
Sys: add error messages on discrepancy in cuda arches between configure time and runtime; also log view runtime cuda arch
Consolidate PETSc stream types
Minor fixes to website material
LIBBASE is no longer used in make so remove it
Fix bugs in handling PetscViewerGetSubViewer() and tabing in ASCII viewersReported-by: Pierre Jolivet
Rename rules.doc and rules.utils because GitLab treats the former as a MS Word document.Thanks-to: Jed Brown
Remove DIRS variable and unneeded tabs from all makefiles since no longer neededCommit-type: housekeeping
Convert all header guards to pragma once
1234