Difference between revisions of "ALCF/Archiving Data at ALCF"
|  (Initial page creation and outlining) |  (Add further documentation section) | ||
| (14 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| − | ALCF's High Performance Storage System (HPSS)  | + | [[ALCF]]'s High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in  [https://www.alcf.anl.gov/support/user-guides/data-management/filesystem-and-storage/hpss/index.html ALCF documentation] that can be used, <code>hsi</code> and <code>htar</code>. In this wiki, the <code>hsi</code> interface will be focused on.   | 
| − | == HSI  | + | == HSI Basics == | 
| − | + | <code>hsi</code> is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter <code>hsi</code>, the system will place you into your home HPSS space at <code>/home/username</code>. <code>hsi</code> keeps track of both your location in this HPSS space and also your location in the "local" system that you are running <code>hsi</code> from. The <code>hsi</code> system will automatically set the "local" directory location to be the location that you entered the utility from.   | |
| Navigation through HPSS operates the same as a normal command line, with <code>ls</code> <code>cd</code>, and <code>mkdir</code> among others being valid commands for navigation through HPSS. If you need to navigate though the "local" directories though, this can still be done by appending an "l" to the front of these standard commands (i.e. <code>lls</code> <code>lcd</code>, and <code>lmkdir</code>). This can be useful if you entered <code>hsi</code> at the incorrect point or wish to archive data in multiple locations. | Navigation through HPSS operates the same as a normal command line, with <code>ls</code> <code>cd</code>, and <code>mkdir</code> among others being valid commands for navigation through HPSS. If you need to navigate though the "local" directories though, this can still be done by appending an "l" to the front of these standard commands (i.e. <code>lls</code> <code>lcd</code>, and <code>lmkdir</code>). This can be useful if you entered <code>hsi</code> at the incorrect point or wish to archive data in multiple locations. | ||
| − | It should be noted that <code>hsi</code> does support tab-complete or up-arrowing for past commands. It is recommended that you enter the utility with a defined plan in order to reduce the amount of annoyance that the lack of these luxuries can cause. | + | It should be noted that <code>hsi</code> does ''not'' support tab-complete or up-arrowing for past commands. It is recommended that you enter the utility with a defined plan in order to reduce the amount of annoyance that the lack of these luxuries can cause. | 
| == Archiving of Data == | == Archiving of Data == | ||
| − | ==  | + | Once the destination directory for the data has been created and/or navigated to, there are a few options and considerations to actually archive the data. These are most notably: | 
| + | |||
| + | * <code>put</code> | ||
| + | * <code>cput</code> | ||
| + | |||
| + | <code>put</code> is the most basic archiving tool, and will overwrite any versions of the files being archived already on HPSS. <code>cput</code> is a conditional version of <code>put</code> that will only overwrite files if there is a newer version "locally" compared to the file already on HPSS. This makes <code>cput</code> the tool of choice for updating partially archived datasets, but due to its otherwise similar functionality to <code>put</code>, it is also the recommended default command to use. | ||
| + | |||
| + | Both <code>put</code> and <code>cput</code> have similar syntax, and the following will cover both, but <code>cput</code> will be used as an example. It is assumed that the <code>hsi</code> command has already been run to enter the <code>hsi</code> utility before attempting the following. | ||
| + | |||
| + | Simple usage to store a single file is: | ||
| + | |||
| + | <code>cput <filename></code> | ||
| + | |||
| + | To change the name of a file as it is archived: | ||
| + | |||
| + | <code>cput <filename> : <newFilename></code> | ||
| + | |||
| + | Whole directories can be stored using: | ||
| + | |||
| + | <code>cput -R <dirName></code> | ||
| + | |||
| + | If you with to keep the parent directory intact, or with: | ||
| + | |||
| + | <code>cput -R "*"</code> | ||
| + | |||
| + | If you simply want to move the contents of a directory and everything beneath it. Simply be mindful of your "local" directory location when choosing between these options. | ||
| + | |||
| + | If you need to retrieve data from tape and put it back on the "local" system, the <code>get</code> and <code>cget</code> commands act in the same way as <code>put</code> and <code>cput</code> but in reverse. | ||
| + | |||
| + | == Troubleshooting == | ||
| + | If you are a first time user of HPSS, you will likely get an error regarding a key file. This is something that must be taken care of by ALCF support (support@alcf.anl.gov). Simply email them with your ALCF username and state that you need access set up for HPSS. | ||
| + | |||
| + | == Further Documentation == | ||
| + | While standard use cases of <code>hsi</code> is covered above, more through documentation is [https://docs.nersc.gov/filesystems/archive/ available from NERSC]. | ||
| + | |||
| + | There is also a [https://www.hpss-collaboration.org/documents/HSI_8.3_Reference_Manual.pdf pdf reference manual] (backup saved here: [[File:HSI 8.3 Reference Manual.pdf]]) that goes into more detail. Note that it is for version 8.3, while ALCF is currently (as of 2022-10-04) running 7.4.  | ||
| + | |||
| + | [[Category:Compute Facilities]] | ||
Latest revision as of 09:44, 4 October 2022
ALCF's High Performance Storage System (HPSS) is a robotic tape drive system used for large amount of archival data storage that will not be often accessed. The system has two interfaces listed in  ALCF documentation that can be used, hsi and htar. In this wiki, the hsi interface will be focused on. 
HSI Basics
hsi is a utility to interface with the HPSS system. It looks and operates much like the typical bash command lines that we are used to, but with some added complexities. When you enter hsi, the system will place you into your home HPSS space at /home/username. hsi keeps track of both your location in this HPSS space and also your location in the "local" system that you are running hsi from. The hsi system will automatically set the "local" directory location to be the location that you entered the utility from. 
Navigation through HPSS operates the same as a normal command line, with ls cd, and mkdir among others being valid commands for navigation through HPSS. If you need to navigate though the "local" directories though, this can still be done by appending an "l" to the front of these standard commands (i.e. lls lcd, and lmkdir). This can be useful if you entered hsi at the incorrect point or wish to archive data in multiple locations.
It should be noted that hsi does not support tab-complete or up-arrowing for past commands. It is recommended that you enter the utility with a defined plan in order to reduce the amount of annoyance that the lack of these luxuries can cause.
Archiving of Data
Once the destination directory for the data has been created and/or navigated to, there are a few options and considerations to actually archive the data. These are most notably:
-  put
-  cput
put is the most basic archiving tool, and will overwrite any versions of the files being archived already on HPSS. cput is a conditional version of put that will only overwrite files if there is a newer version "locally" compared to the file already on HPSS. This makes cput the tool of choice for updating partially archived datasets, but due to its otherwise similar functionality to put, it is also the recommended default command to use.
Both put and cput have similar syntax, and the following will cover both, but cput will be used as an example. It is assumed that the hsi command has already been run to enter the hsi utility before attempting the following.
Simple usage to store a single file is:
cput <filename>
To change the name of a file as it is archived:
cput <filename> : <newFilename>
Whole directories can be stored using:
cput -R <dirName>
If you with to keep the parent directory intact, or with:
cput -R "*"
If you simply want to move the contents of a directory and everything beneath it. Simply be mindful of your "local" directory location when choosing between these options.
If you need to retrieve data from tape and put it back on the "local" system, the get and cget commands act in the same way as put and cput but in reverse.
Troubleshooting
If you are a first time user of HPSS, you will likely get an error regarding a key file. This is something that must be taken care of by ALCF support (support@alcf.anl.gov). Simply email them with your ALCF username and state that you need access set up for HPSS.
Further Documentation
While standard use cases of hsi is covered above, more through documentation is available from NERSC.
There is also a pdf reference manual (backup saved here: File:HSI 8.3 Reference Manual.pdf) that goes into more detail. Note that it is for version 8.3, while ALCF is currently (as of 2022-10-04) running 7.4.
