xref: /petsc/config/examples/arch-alcf-polaris.py (revision 9b43db70b9c026bfd5e783b6b6af8129a8c6066a)
15d94af91SJunchao Zhang#!/usr/bin/python3
25d94af91SJunchao Zhang
33ab125cbSJunchao Zhang# Use GNU compilers:
45d94af91SJunchao Zhang#
55d94af91SJunchao Zhang# Note cray-libsci provides BLAS etc. In summary, we have
61166ac79SJunchao Zhang# module use /soft/modulefiles
71166ac79SJunchao Zhang# module unload darshan
86afbcc8fSJunchao Zhang# module load PrgEnv-gnu cray-libsci nvhpc-mixed craype-accel-nvidia80 cudatoolkit-standalone/12.4.1
9e3df6193SJunchao Zhang# export MPICH_GPU_SUPPORT_ENABLED=1
10e3df6193SJunchao Zhang# export MPICH_GPU_IPC_ENABLED=0
113ab125cbSJunchao Zhang#
125d94af91SJunchao Zhang# $ module list
135d94af91SJunchao Zhang# Currently Loaded Modules:
14e3df6193SJunchao Zhang#   1) libfabric/1.15.2.0       7) nghttp2/1.57.0-ciat5hu         13) cray-mpich/8.1.28   19) cray-libsci/23.12.5
15e3df6193SJunchao Zhang#   2) craype-network-ofi       8) curl/8.4.0-2ztev25             14) cray-pmi/6.1.13     20) nvhpc-mixed/23.9
16e3df6193SJunchao Zhang#   3) perftools-base/23.12.0   9) cmake/3.27.7                   15) cray-pals/1.3.4     21) craype-accel-nvidia80
17e3df6193SJunchao Zhang#   4) darshan/3.4.4           10) cudatoolkit-standalone/12.4.1  16) cray-libpals/1.3.4
18e3df6193SJunchao Zhang#   5) gcc-native/12.3         11) craype/2.7.30                  17) craype-x86-milan
19e3df6193SJunchao Zhang#   6) spack-pe-base/0.6.1     12) cray-dsmml/0.2.2               18) PrgEnv-gnu/8.5.0
20e3df6193SJunchao Zhang#
21e3df6193SJunchao Zhang# With above, Cray-MPICH GPU-aware MPI works on a node but still fail with multiple nodes. In the latter case, you can
22f0b74427SPierre Jolivet# add the PETSc runtime option -use_gpu_aware_mpi 0 as a workaround.
235d94af91SJunchao Zhang
245d94af91SJunchao Zhangif __name__ == '__main__':
255d94af91SJunchao Zhang  import sys
265d94af91SJunchao Zhang  import os
275d94af91SJunchao Zhang  sys.path.insert(0, os.path.abspath('config'))
285d94af91SJunchao Zhang  import configure
295d94af91SJunchao Zhang  configure_options = [
305d94af91SJunchao Zhang    '--with-cc=cc',
315d94af91SJunchao Zhang    '--with-cxx=CC',
325d94af91SJunchao Zhang    '--with-fc=ftn',
335d94af91SJunchao Zhang    '--with-debugging=0',
345d94af91SJunchao Zhang    '--with-cuda',
355d94af91SJunchao Zhang    '--with-cudac=nvcc',
365d94af91SJunchao Zhang    '--with-cuda-arch=80', # Since there is no easy way to auto-detect the cuda arch on the gpu-less Polaris login nodes, we explicitly set it.
375d94af91SJunchao Zhang    '--download-kokkos',
385d94af91SJunchao Zhang    '--download-kokkos-kernels',
39*04ba64a0SVictor A. P. Magri    '--download-umpire',
401166ac79SJunchao Zhang    '--download-hypre',
415d94af91SJunchao Zhang  ]
425d94af91SJunchao Zhang  configure.petsc_configure(configure_options)
435d94af91SJunchao Zhang
443ab125cbSJunchao Zhang# Use NVHPC compilers
453ab125cbSJunchao Zhang#
463ab125cbSJunchao Zhang# Unset so that cray won't add -gpu to nvc even when craype-accel-nvidia80 is loaded
473ab125cbSJunchao Zhang# unset CRAY_ACCEL_TARGET
483ab125cbSJunchao Zhang# module load nvhpc/22.11 PrgEnv-nvhpc
493ab125cbSJunchao Zhang#
503ab125cbSJunchao Zhang# I met two problems with nvhpc and Kokkos (and Kokkos-Kernels) 4.2.0.
513ab125cbSJunchao Zhang# 1) Kokkos-Kernles failed at configuration to find TPL cublas and cusparse from NVHPC.
523ab125cbSJunchao Zhang#    As a workaround, I just load cudatoolkit-standalone/11.8.0 to let KK use cublas and cusparse from cudatoolkit-standalone.
533ab125cbSJunchao Zhang# 2) KK failed at compilation
543ab125cbSJunchao Zhang# "/home/jczhang/petsc/arch-kokkos-dbg/externalpackages/git.kokkos-kernels/batched/dense/impl/KokkosBatched_Gemm_Serial_Internal.hpp", line 94: error: expression must have a constant value
553ab125cbSJunchao Zhang#     constexpr int nbAlgo = Algo::Gemm::Blocked::mb();
563ab125cbSJunchao Zhang#                            ^
573ab125cbSJunchao Zhang# "/home/jczhang/petsc/arch-kokkos-dbg/externalpackages/git.kokkos-kernels/blas/impl/KokkosBlas_util.hpp", line 58: note: cannot call non-constexpr function "__builtin_is_device_code" (declared implicitly)
583ab125cbSJunchao Zhang#           KOKKOS_IF_ON_HOST((return 4;))
593ab125cbSJunchao Zhang#           ^
603ab125cbSJunchao Zhang#           detected during:
613ab125cbSJunchao Zhang#
623ab125cbSJunchao Zhang# It is a KK problem and I have to wait for their fix.
63