FAQ

 

Home
Clusters
Queues
Jobs
Commands
Manuals
FAQ

This page contains answers to common questions about LSF, along with some tips and tricks that we have found useful and presented here as questions. If your question is not answered here, please tell the LSF administrator.

  1. What is LSF?
  2. How can I tell if LSF is running ?
  3. How do I make LSF put the output from my job in a file rather than an email message ?
  4. What does a status of "pend" mean ?
  5. What happens to my AFS token if my LSF job runs for more than 25 hours ?
  6. Is there an alternative to putting the bsub options on the command line ?
  7. What is the difference between the bsub and bsas commands?
  8. How do I kill all of my jobs at once?

What is LSF?

LSF is the applications resource management component of the DCI. It helps us balance the workload on our UNIX servers while giving you access to the software and hardware you need to get your work done regardless of where you are logged in

LSF does load sharing within a cluster, or group of hosts. The hosts in the ATN cluster include large servers such as StatApps and SciComp, which can run resource-intensive applications like SAS or Gaussian. Other hosts are clients such as the Isis login nodes or even personal workstations

        Back to Top

How can I tell if LSF is running ?

The lsid command will tell you if LSF is available on your system:

% lsid 
LSF 5.0, May 31 2002
Copyright 1992-2002 Platform Computing Corporation

My cluster name is atn
My master name is lsfmaster

You can use the bhosts command to verify that the batch component of LSF is working:

HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
calcite            ok              -      2      0      0      0      0      0
caroline           ok              -      1      0      0      0      0      0
chastity           ok              2     12      3      3      0      0      0
hazy               ok              -      -      0      0      0      0      0
lsfmaster          ok              -      4      0      0      0      0      0
macbeth            ok              -      -      2      2      0      0      0
maya               ok              -      1      0      0      0      0      0
nun                ok              3      4      2      2      0      0      0
sanger             ok              6      8      0      0      0      0      0
stormy             ok              -      4      0      0      0      0      0
sunny              ok              6     70     23     23      0      0      0
zephyr             ok              8     60     40     40      0      0      0


Back to Top

How do I make LSF put  the output from my job in a file rather than an email message ?

By default, LSF includes all of the standard output (stdout)  and standard error (stderr)  from your job in the email message it sends you after your job finishes. However, you can force all standard output to a file by using the -o  argument to the bsub command, and you can force all standard error to a file using the -e argument. For example,

bsub -o my.out -e my.err ...

If you specify a -o argument but do not specify a -e argument, the standard error is merged with the standard output.

The output file created by the -o option to the bsub command normally contains job report information as well as the job output. If you want to separate the job report information from the job output, use the -N option to specify that the job report information should be sent by email.

The output files specified by the -o and -e options are created on the execution host.

Tip: To uniquely identify the output from your job, add the job number to the file name using the special %J variable. For example:

bsub -o out.%J ...
Back to Top

What does a job status of "PEND" mean ?

An LSF job typically goes from PEND to RUN to either DONE or EXIT. Usually a job is pending for only a few seconds while LSF dispatches it to the appropriate server for execution. If it is pending for more than a few seconds, it is often for one of the following reasons:

bulletYou have reached the limit for the number of jobs you can run at the same time in the queue
bulletYou have requested that your job run on a server which does not recognize your account
bulletThe server you requested has been temporarily closed to new jobs
bulletThe job has a dependency which has not been met, e.g., another job must finish first

You can use the -lp option of the bjobs command to see why a job is pending,

Back to Top

What happens to my AFS token if my LSF job runs for more than 25 hours ?

This will not be a problem. LSF automatically renews your AFS tokens for you, no matter how long your job runs.

Back to Top

Is there an alternative to putting the bsub options on the command line ?

If you typically use the bsub command directly (rather than a command like bsas) you probably specify one or more options on the command line. If you run the same job over and over, typing the bsub command and it's options can be tedious.

An alternative is to create an LSF script file containing your commands and options. For example, you could include lines like

#BSUB -q parallel
#BSUB -m zephyr
#BSUB -J myjob

and so on. These lines are considered comments by the shell, so it ignores them, but LSF should interpret them correctly.

Note: An LSF batch script file is not the same as a shell script. although they are similar. An LSF script file does not have to be executable, and is meant to be used as standard input to the bsub command.. For example, you should type:

bsub < myscript

rather than

bsub myscript

 

Back to Top

What is the difference between the bsub and bsas commands?

The bsub command is the basic method for submitting a job to LSF, while bsas is a special command created by ATN to make it easier for SAS users to run a batch job. If you typed

bsas myprog.sas

the commands which would actually be executed would be something like

bsub -q batch -m sunny /opt/sas8/sas myprog.sas

If you wanted to use a different queue or server, or if you wanted to use any of the other bsub options, you would need to create you own bsub command or an LSF script file.

Back to Top

 

How do I kill all of my jobs at once?

If you really want to kill all of your LSF jobs at once you can do so by specifying a job number of "0" (zero) on the bkill command:

bkill 0

Note that both running and pending jobs will be killed.

You can also limit the bkill command to only those jobs that are in a specific queue or on a specific host:

bkill -m sunny 0

bkill -q batch 0

Back to Top

 
Last updated Monday, October 21, 2002 09:44 AM