[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- About Directories and Files
- Using LSF with Non-Shared File Systems
- Remote File Access
- File Transfer Mechanism (lsrcp)
[ Top ]
About Directories and Files
LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts.
LSF includes support for copying user data to the execution host before running a batch job, and for copying results back after the job executes.
In networks where the file systems are not shared, this can be used to give remote jobs access to local data.
Supported file systems
On UNIX systems, LSF supports the following shared file systems:
- Network File System (NFS)
NFS file systems can be mounted permanently or on demand using
automount.- Andrew File System (AFS)
- Distributed File System (DCE/DFS)
On Windows, directories containing LSF files can be shared among hosts from a Windows server machine.
Non-shared directories and files
LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes. See Remote File Access for more information.
Some networks do not share files between hosts. LSF can still be used on these networks, with reduced fault tolerance. See Using LSF with Non-Shared File Systems for information about using LSF in a network without a shared file system.
[ Top ]
Using LSF with Non-Shared File Systems
LSF installation
To install LSF on a cluster without shared file systems, follow the complete installation procedure on every host to install all the binaries, man pages, and configuration files.
Configuration files
After you have installed LSF on every host, you must update the configuration files on all hosts so that they contain the complete cluster configuration. Configuration files must be the same on all hosts.
Master host
You must choose one host to act as the LSF master host. LSF configuration files and working directories must be installed on this host, and the master host must be listed first in
lsf.cluster.cluster_name.You can use the parameter LSF_MASTER_LIST in
lsf.confto define which hosts can be considered to be elected master hosts. In some cases, this may improve performance.Fault tolerance
Some fault tolerance can be introduced by choosing more than one host as a possible master host, and using NFS to mount the LSF working directory on only these hosts. All the possible master hosts must be listed first in
lsf.cluster.cluster_name. As long as one of these hosts is available, LSF continues to operate.[ Top ]
Remote File Access
Using LSF with non-shared file space
LSF is usually used in networks with shared file space. When shared file space is not available, use the
bsub -fcommand to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.LSF attempts to run a job in the directory where the
bsubcommand was invoked. If the execution directory is under the user's home directory,sbatchdlooks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the/netautomount option.If the directory is not available on the execution host, the job is run in
/tmp. Any files created by the batch job, including the standard output and error files created by the-oand-eoptions tobsub, are left on the execution host.LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the
-foption tobsub.LSF uses the
lsrcpcommand to transfer files.lsrcpcontacts RES on the remote host to perform file transfer. If RES is not available, the UNIXrcpcommand is used. See File Transfer Mechanism (lsrcp) for more information.bsub -f
The
-f "[local_fileoperator[remote_file]]"option to thebsubcommand copies a file between the submission host and the execution host. To specify multiple files, repeat the-foption.File name on the submission host
File name on the execution host
The files local_file and remote_file can be absolute or relative file path names. You must specific at least one file name. When the file remote_file is not specified, it is assumed to be the same as local_file. Including local_file without the operator results in a syntax error.
Operation to perform on the file. The operator must be surrounded by white space.
Valid values for operator are:
local_file on the submission host is copied to remote_file on the execution host before job execution. remote_file is overwritten if it exists.
remote_file on the execution host is copied to local_file on the submission host after the job completes. local_file is overwritten if it exists.
remote_file is appended to local_file after the job completes. local_file is created if it does not exist.
Equivalent to performing the > and then the < operation. The file local_file is copied to remote_file before the job executes, and remote_file is copied back, overwriting local_file, after the job completes. <> is the same as ><
If the submission and execution hosts have different directory structures, you must ensure that the directory where remote_file and local_file will be placed exists. LSF tries to change the directory to the same path name as the directory where the
bsubcommand was run. If this directory does not exist, the job is run in your home directory on the execution host.You should specify remote_file as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the
bsubcommand is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.bsub -i
If the input file specified with
bsub -iis not found on the execution host, the file is copied from the submission host using the LSF remote file access facility and is removed from the execution host after the job finishes.bsub -o and bsub -e
The output files specified with the
-oand-earguments tobsubare created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system.For example, the following command stores the job output in the
job_outfile and copies the file back to the submission host:% bsub -o job_out -f "job_out <" myjobExample
To submit
myjobto LSF, with input taken from the file/data/data3and the output copied back to/data/out3, run the command:% bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3To run the job
batch_update, which updates thebatch_datafile in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:% bsub -f "batch_data <>" batch_update batch_data[ Top ]
File Transfer Mechanism (lsrcp)
The LSF remote file access mechanism (
bsub -f) useslsrcpto process the file transfer. Thelsrcpcommand tries to connect to RES on the submission host to handle the file transfer.See Remote File Access for more information about using
bsub -f.Limitations to lsrcp
Because LSF client hosts do not run RES, jobs that are submitted from client hosts should only specify
bsub -fifrcpis allowed. You must set up the permissions forrcpif account mapping is used.File transfer using
lscrpis not supported in the following contexts:
- If LSF account mapping is used;
lsrcpfails when running under a different user account- LSF client hosts do not run RES, so
lsrcpcannot contact RES on the submission hostSee User Account Mapping for more information.
Workarounds
In these situations, use the following workarounds:
If
lsrcpcannot contact RES on the submission host, it attempts to usercpto copy the file. You must set up the/etc/hosts.equivorHOME/.rhostsfile in order to usercp.See the
rcp(1) andrsh(1) man pages for more information on using thercpcommand.You can replace
lsrcpwith your own file transfer mechanism as long as it supports the same syntax aslsrcp. This might be done to take advantage of a faster interconnection network, or to overcome limitations with the existinglsrcp.sbatchdlooks for thelsrcpexecutable in theLSF_BINDIRdirectory as specified in thelsf.conffile.[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: January 12, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.