ALCF/Archiving Data at ALCF

From PHASTA Wiki
(Redirected from Archiving Data at ALCF)
Jump to: navigation, search

ALCF's High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in ALCF documentation that can be used, hsi and htar. In this wiki, the hsi interface will be focused on.

HSI Basics

hsi is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter hsi, the system will place you into your home HPSS space at /home/username. hsi keeps track of both your location in this HPSS space and also your location in the "local" system that you are running hsi from. The hsi system will automatically set the "local" directory location to be the location that you entered the utility from.

Navigation through HPSS operates the same as a normal command line, with ls cd, and mkdir among others being valid commands for navigation through HPSS. If you need to navigate though the "local" directories though, this can still be done by appending an "l" to the front of these standard commands (i.e. lls lcd, and lmkdir). This can be useful if you entered hsi at the incorrect point or wish to archive data in multiple locations.

It should be noted that hsi does not support tab-complete or up-arrowing for past commands. It is recommended that you enter the utility with a defined plan in order to reduce the amount of annoyance that the lack of these luxuries can cause.

Archiving of Data

Once the destination directory for the data has been created and/or navigated to, there are a few options and considerations to actually archive the data. These are most notably:

  • put
  • cput

put is the most basic archiving tool, and will overwrite any versions of the files being archived already on HPSS. cput is a conditional version of put that will only overwrite files if there is a newer version "locally" compared to the file already on HPSS. This makes cput the tool of choice for updating partially archived datasets, but due to its otherwise similar functionality to put, it is also the recommended default command to use.

Both put and cput have similar syntax, and the following will cover both, but cput will be used as an example. It is assumed that the hsi command has already been run to enter the hsi utility before attempting the following.

Simple usage to store a single file is:

cput <filename>

To change the name of a file as it is archived:

cput <filename> : <newFilename>

Whole directories can be stored using:

cput -R <dirName>

If you with to keep the parent directory intact, or with:

cput -R "*"

If you simply want to move the contents of a directory and everything beneath it. Simply be mindful of your "local" directory location when choosing between these options.

If you need to retrieve data from tape and put it back on the "local" system, the get and cget commands act in the same way as put and cput but in reverse.

Troubleshooting

If you are a first time user of HPSS, you will likely get an error regarding a key file. This is something that must be taken care of by ALCF support (support@alcf.anl.gov). Simply email them with your ALCF username and state that you need access set up for HPSS.

Further Documentation

While standard use cases of hsi is covered above, more through documentation is available from NERSC.

There is also a pdf reference manual (backup saved here: File:HSI 8.3 Reference Manual.pdf) that goes into more detail. Note that it is for version 8.3, while ALCF is currently (as of 2022-10-04) running 7.4.