NAS
Wiki for information related to the NASA Advanced Supercomputing (NAS) facility.
Contents
Overview
Key | Value | Notes |
---|---|---|
Machines | Pleiades | Compute |
Lou | Storage and Analysis | |
Electra | Compute | |
Endeavour | Compute | |
Merope | Compute | |
Job Submission System | PBS | |
Facility Documentation | Support Knowledgebase |
How-To's
How-To's in Separate Wiki's
Backup Data from Scratch Directories
This is done simply by copying data from the /nobackup/$USER
directories to your home directory on Lou (lfe
). The /nobackup/$USER
directories are mounted onto lfe
, so transfers should be done on lfe
.
It is recommended to mirror the directory structure of your /nobackup/$USER
directory on lfe
to allow for the data to be easily recovered back to it's original state. This is especially important if you use symlinks (as they are path dependent and will break if either the source file or the symlink itself are not in the correct location).
This can be done with scp
, but it is recommended to use NASA's in-house utility shiftc
. shiftc
will automatically perform parallel file transfers, data integrity checks and repairs, and syncing features similar to rsync
.
Commands:
jrwrigh7@lfe7: shiftc -r -d --sync /nobackup/jrwrigh7/models/STGFlatPlate/STFM_Tet_dz4-10_dx15 .
This will copy the directory STFM_Tet_dz4-10_dx15
to the current location (.
). The flags do as follows
-
-r
: Recursively copy files from destination -
-d
: Create required directories that don't already exist. Equivalent of the-p
flag formkdir
-
--sync
: Only copy over "new" files, where "new" are any changes to the modification time or file size.- If a file exists on destination (
.
), but not source (STFM_Tet_dz4-10_dx15
), it will not be copied back to source nor will it be deleted to match the state of source.
- If a file exists on destination (
Once this command is submitted, the transfer process will be backgrounded. Progress can be viewed by running shiftc --monitor
. Additionally, you will recieve an email with the transfer job is completed.
jrwrigh7@lfe7: shiftc --stop --id [shiftc job ID]
This will stop the given shiftc job. The [shiftc job ID]
is the same number that appears beside the output of shiftc --monitor
.
More documentation for shiftc
can be found in its man page (man shiftc
) and on NAS's documentation website.
Control MPI Rank Placement
Rank 1 Solo Node
To make the rank 1 MPI process take a node on it's own, put this in the PBS directives:
#PBS -l select=1:mpiprocs=1:model=sky_ele+1:mpiprocs=40:model=sky_ele
This will request 2 nodes: One will have the rank 1 process all by itself, and the other will have 40 MPI Processes (for all 40 CPU cores available on sky_ele
nodes).
Distribute Non-First Rank MPI Processes
For controlling the placement of non-first rank MPI processes, use the mbind.x
utility.
For example, if we have requested 4 nodes and want 10 MPI processes per node, the mpiexec
command needs to be modified to the following:
mpiexec -np 40 /u/scicon/tools/bin/mbind.x -n10 [executable]
Note that mbind.x
is also socket aware, so it will distribute nodes evenly between nodes and between CPU's in each node (NAS nodes have 2 CPU's per node).
For more information on mbind.x
, see it's help flag (mbind.x -help
) or NAS's documentation website.
See Priority "Score" in Queue
To see what your priority "score" in PBS is use the qstat -W o=+pri
to add the "Priority" column to the output of qstat
.
Priority Scoring (as of 2021-01-22)
- Job priority score grows by 1 every 12 hours
- We are capped at a max score of 20 per job
- Note that other users/groups using NAS may start with higher priority and grow higher than 20
- Result is that it's quite difficult to get large jobs running
- If you don't have any jobs running, you get an addition +10 to the score
- This score bump is removed as soon as you have a running job