Difference between revisions of "ALCF/Archiving Data at ALCF"

From PHASTA Wiki
Jump to: navigation, search
m
m
Line 1: Line 1:
 
ALCF's High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in ALCF documentation ([https://www.alcf.anl.gov/support-center/theta/using-hpss-theta]) that can be used, <code>hsi</code> and <code>htar</code>. In this wiki, the <code>hsi</code> interface will be focused on.  
 
ALCF's High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in ALCF documentation ([https://www.alcf.anl.gov/support-center/theta/using-hpss-theta]) that can be used, <code>hsi</code> and <code>htar</code>. In this wiki, the <code>hsi</code> interface will be focused on.  
  
== HSI Overview ==
+
== HSI Basics ==
  
 
HSI is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter <code>hsi</code>, the system will place you into your home HPSS space at <code>/home/username</code>. <code>hsi</code> keeps track of both your location in this HPSS space and also your location in the "local" system that you are running <code>hsi</code> from. The <code>hsi</code> system will automatically set the "local" directory location to be the location that you entered the utility from.  
 
HSI is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter <code>hsi</code>, the system will place you into your home HPSS space at <code>/home/username</code>. <code>hsi</code> keeps track of both your location in this HPSS space and also your location in the "local" system that you are running <code>hsi</code> from. The <code>hsi</code> system will automatically set the "local" directory location to be the location that you entered the utility from.  

Revision as of 14:02, 22 August 2022

ALCF's High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in ALCF documentation ([1]) that can be used, hsi and htar. In this wiki, the hsi interface will be focused on.

HSI Basics

HSI is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter hsi, the system will place you into your home HPSS space at /home/username. hsi keeps track of both your location in this HPSS space and also your location in the "local" system that you are running hsi from. The hsi system will automatically set the "local" directory location to be the location that you entered the utility from.

Navigation through HPSS operates the same as a normal command line, with ls cd, and mkdir among others being valid commands for navigation through HPSS. If you need to navigate though the "local" directories though, this can still be done by appending an "l" to the front of these standard commands (i.e. lls lcd, and lmkdir). This can be useful if you entered hsi at the incorrect point or wish to archive data in multiple locations.

It should be noted that hsi does support tab-complete or up-arrowing for past commands. It is recommended that you enter the utility with a defined plan in order to reduce the amount of annoyance that the lack of these luxuries can cause.

While standard use cases of hsi will be covered below, more documentation (that is more thorough and helpful than the ALCF documentation is available here: [2].

Archiving of Data

Once the destination directory for the data has been created and/or navigated to, there are a few options and considerations to actually archive the data. These are most notably:

  • put
  • cput

put is the most basic archiving tool, and will overwrite any versions of the files being archived already on HPSS. cput is a conditional version of put that will only overwrite files if there is a newer version "locally" compared to the file already on HPSS. This makes cput the tool of choice for updating partially archived datasets, but due to its otherwise similar functionality to put, it is also the recommended default command to use.

Both put and cput have similar syntax, and the following will cover both, but cput will be used as an example. It is assumed that the hsi command has already been run to enter the hsi utility before attempting the following.

Simple usage to store a single file is:

cput <filename>

To change the name of a file as it is archived:

cput <filename> : <newFilename>

Whole directories can be stored using:

cput -R <dirName>

If you with to keep the parent directory intact, or with:

cput -R "*"

If you simply want to move the contents of a directory and everything beneath it. Simply be mindful of your "local" directory location when choosing between these options.