Personal tools
You are here: Home Data Storage TSM HSM (HPSS) for users of the IBM Power6 High-Performance Computer 'vip'

HSM (HPSS) for users of the IBM Power6 High-Performance Computer 'vip'

 

Questions & Answers 

 

  • About the /r migrating filesystem - a Hierarchical Storage Management (HSM) using HPSS

     

  • About Backups and Archives

  •  


     

    About the /r migrating filesystem - a Hierarchical Storage Management (HSM) using HPSS


    The RZG has installed a migrating filesystem for the IBM Power6 High-Performance Computer. Data written to this filesystem will automatically be moved from disk to tape to free space on disk when necessary or back from tape to disk when needed again by the user. This is similar to the functionality offered by the m-tree in AFS.


    Here is some information about the workings and usage of this filesystem:

     

    • The migrating filesystem is called /ghi. It can be accessed from all vip nodes. Each user has a subdirectory /ghi/r/<initial>/<userid> to store his/her data. There is also a symbolic link /r pointing to /ghi/r, so in practice a user with ID smith would work with /r/s/smith.

       

    • The system constantly monitors the fillage of the filesystem. When the filesystem gets full above a certain value, files will be transferred from disk to tape, beginning with the largest files which have been unused the longest time.

       

    • If (by using some program or command) you access a file which has been migrated to tape, the file will automatically be transferred back from tape to disk. This of course implies a certain delay. The command will appear to hang, but it will just wait until the data is online and then continue.

       

    • You can manually force the recall of a migrated file by using any command which opens the file. You can recall in advance all files needed by some job with a command like
      file myfiles/*

       

    • You can see which files are resident on disk and which ones have been migrated to tape with the command ghi_ls (located in /usr/local/bin), optionally with the option -l. Here is a sample output:
      vip001% ghi_ls -l
      G -rw-r--r--   1  ifw       rzs             22 Nov 21 15:12 a1
      H -rw-------   1  ifw       rzs   138958551040 Sep 18 22:22 abc.tar
      H -rw-r--r--   1  ifw       rzs     1073741312 May 06 2009  core
      G -rw-r--r--   1  ifw       rzs              0 Jun 20 2008  dsmerror.log
      B -rw-r--r--   1  ifw       rzs     1079040000 Aug 03 2010  dummyz3
      The first column states where the file resides: a 'G' means the file is resident on the GPFS disk; a 'H' means the file has been transferred to the underlying HPSS archiving system, probably on tape; a 'B' means 'both': the file has already been copied to HPSS but is still present on disk and can be removed immediately if the system needs to free disk space.

       

    • The system can only migrate files which are bigger than the disk block size, which for this filesystem is 1 MB (one megabyte). Files smaller than 1MB stay resident on disk, permanently occupying disk space and, what's worse, making the total number of files grow so large that operations like scanning the filesystem for making backups become increasingly slow.

      In addition: while files larger than 1 MB can be migrated, the system works efficiently only for file sizes larger than about 1 GB (one gigabyte). The reason is that reading or writing data to/from tape implies waiting for a tape drive to become available, then waiting for a tape to get mounted in the drive and then waiting for the tape to get rewinded/positioned. This can typically take several minutes. Once a tape is available and in position, the system can read or write data very fast. A 1 GB file can be read in under 10 seconds. Contrast this with reading 1 GB of data spread across 1000 files, each 1 MB in size, which would need at the very least 1000 tape-positioning operations, maybe also mounting several tapes (possibly hundreds!).

      For these reasons, all users are kindly asked to ensure having only large files on this filesystem. If you have many small files, please pack them first together to a large file with a suitable tool like tar, cpio, ar, zip or whatever. Please try to keep the size of files stored on the '/r' filesystem within a range of about 1 GB (one gigabyte) to about 500 GB (five hundred gigabytes).

      Here is a simple example of how to use tar to pack some small files small000, small001, etc to a big file big.tar:

      tar cvf big.tar small*

       

    • Every file being migrated gets simultaneously written to two different tapes. In this way, in case of a tape failure while reading back the data from the first tape, the file can probably still be read from the second tape.

     



 


 

About Backups and Archives

  1. I am a user of the vip system at the RZG. Do my files get backed up?

    No, no backups of user files are done. To protect your data, keep an additional copy of all important data on another system, e.g. in your AFS volume.

  2. How can I archive data with TSM in order to make room on the vip disks?

    Log in to the vip machine. Run adsm local user. Click on Archive files and directories into long term storage. Click on the + symbol on the left of Local and look for the directory you are interested in by further clicking on the + symbols. When you have found the directory of interest, click on the small folder symbol right of the +, left of the directory name in order to get the list of files contained in that directory. Then mark the files by clicking on the small grey square at the very left of the file name. You can also mark directories. Enter a description in the entry field above the directory tree. It is advisable to enter a different and meaningful description for every archive you make, which will help you keep your data sorted out. (The description cannot be changed later. If you later want a different description, you have to archive the data again!) Then click on the Archive button which is located above the description. After the archiving is complete, you can delete the data from the disk (maybe you want to check the archive first, see next point).

  3. How can I see / how can I retrieve my archived data?

    Log in to any Unix machine at the RZG (it does not have to be the vip machine). Run adsm local user. Click on Retrieve files and directories from long term storage. You see a list of all archives you have made, sorted by description. To see the contents of an archive, click on the + symbol on the left of the archive name and look for the directory you are interested in by further clicking on the + symbols. When you have found the directory of interest, click on the small folder symbol right of the +, left of the directory name in order to get the list of files contained in that directory.

    If you want to retrieve data, then first make sure you are on the computer where you want to write the data to. Then, after finding the data as described above, mark files or directories by clicking on the small grey square at the very left of the file or directory name. Then click on the Retrieve button which is located above the directory tree. Specify in the next dialog window a location for the retrieved files and click on Retrieve. Wait for the files to be copied from tape to disk.

  4. How can I delete archived data?

    Log in to any Unix machine at the RZG (it does not have to be the vip machine). Run adsm local user. Click on the menu entry Utilities -> Delete Archive Data. Proceed as in point 5 to find and mark the data you want to delete. Then click on the Delete button.

  5. How can I add new files to an existing archive?

    If you archive some data using exactly the same description string as that of an existing archive, then everything will look as one archive with both the old and the new data merged together. Notice that if a file in the existing archive is archived again, then the archive will contain that file twice: the new copy will not overwrite the old one; you can tell them apart by their archive dates.

  6. How long is archived data kept in the TSM Server?

    Archived data is kept with no time limit until you explicitly delete it. Please do delete archived data when you no longer need it.

    By the way, for added safety, the TSM Server keeps two copies of each archive.

  7. How many versions of archived data are kept?

    Versioning does not apply to archives. Each archive is a separate set of data which does not in any way overwrite or supersede other data you might have archived before. If you make an archive and later you archive the same data again, then you have two different archives with the same contents. Both archives are kept until you explicitly delete them.

  8. I would prefer to use the command line to manage my archives. Is that possible?

    Yes. Here are some examples. Type each command in one line. Notice the use of adsmc instead of adsm:

    • To make an archive, use a command like:
      adsmc local user 'archive -description="My vip data"
        -subdir=yes /u/username/mydir/*' >& mylogfile &
    • To display a list of your archives:
      adsmc local user 'query archive /*'
    • To display the contents of a particular archive:
      adsmc local user 'query archive -description="My vip data"
        -subdir=yes /*'
    • To retrieve archive data:
      adsmc local user 'retrieve -description="My vip data" 
        -subdir=yes /u/username/mydir/* /u/username/mytargetdir/'
    • To delete archived data:
      adsmc local user 'delete archive -description="My vip data"
        -subdir=yes /*'

     

    For more details, see the TSM Backup and Archive Client manual

     


    I have a question which is not answered here. Who should I ask?

    The TSM System Administrators.

     

Document Actions