ALCF/Archiving Data at ALCF
ALCF's High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in ALCF documentation that can be used, hsi
and htar
. In this wiki, the hsi
interface will be focused on.
HSI Basics
hsi
is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter hsi
, the system will place you into your home HPSS space at /home/username
. hsi
keeps track of both your location in this HPSS space and also your location in the "local" system that you are running hsi
from. The hsi
system will automatically set the "local" directory location to be the location that you entered the utility from.
Navigation through HPSS operates the same as a normal command line, with ls
cd
, and mkdir
among others being valid commands for navigation through HPSS. If you need to navigate though the "local" directories though, this can still be done by appending an "l" to the front of these standard commands (i.e. lls
lcd
, and lmkdir
). This can be useful if you entered hsi
at the incorrect point or wish to archive data in multiple locations.
It should be noted that hsi
does not support tab-complete or up-arrowing for past commands. It is recommended that you enter the utility with a defined plan in order to reduce the amount of annoyance that the lack of these luxuries can cause.
Archiving of Data
Once the destination directory for the data has been created and/or navigated to, there are a few options and considerations to actually archive the data. These are most notably:
-
put
-
cput
put
is the most basic archiving tool, and will overwrite any versions of the files being archived already on HPSS. cput
is a conditional version of put
that will only overwrite files if there is a newer version "locally" compared to the file already on HPSS. This makes cput
the tool of choice for updating partially archived datasets, but due to its otherwise similar functionality to put
, it is also the recommended default command to use.
Both put
and cput
have similar syntax, and the following will cover both, but cput
will be used as an example. It is assumed that the hsi
command has already been run to enter the hsi
utility before attempting the following.
Simple usage to store a single file is:
cput <filename>
To change the name of a file as it is archived:
cput <filename> : <newFilename>
Whole directories can be stored using:
cput -R <dirName>
If you with to keep the parent directory intact, or with:
cput -R "*"
If you simply want to move the contents of a directory and everything beneath it. Simply be mindful of your "local" directory location when choosing between these options.
If you need to retrieve data from tape and put it back on the "local" system, the get
and cget
commands act in the same way as put
and cput
but in reverse.
Troubleshooting
If you are a first time user of HPSS, you will likely get an error regarding a key file. This is something that must be taken care of by ALCF support (support@alcf.anl.gov). Simply email them with your ALCF username and state that you need access set up for HPSS.
Further Documentation
While standard use cases of hsi
is covered above, more through documentation is available from NERSC.
There is also a pdf reference manual (backup saved here: File:HSI 8.3 Reference Manual.pdf) that goes into more detail. Note that it is for version 8.3, while ALCF is currently (as of 2022-10-04) running 7.4.