|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Jobs are the basic unit of work in LSF. Most of what you will do with LSF involves submitting, monitoring or controlling jobs using the following commands:
Submitting jobs with bsubYou use the bsub command to submit a job in LSF. Here is a very simple bsub command: % bsub uname -a Job <125532> is submitted to default queue <batch>. Each LSF job runs in a queue. If you don't give LSF a queue name, your job will go to the batch queue as it did in the above example.. Each LSF job will be dispatched to a server. If you don't specify the server, LSF will choose one for you. To find the name of the server and the current status of the job, use the bjobs command: % bjobs 125532 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 125532 joeuser DONE batch sunny sunny uname -a Mar 28 13:37 This was a rather trivial job consisting of only one command, so it ran very quickly. It's status (STAT) is DONE, which means it completed successfully. If a job returns anything other than a normal completion code, its status will be exit. This job executed on sunny, the same host from which it was submitted. Unless told otherwise, LSF will chose an execution host with the same architecture as the submission host, in this case Solaris sparc. If more than one server meets that criterion, LSF will choose the most powerful host with the lightest load. If you want your job to run on a specific host, use the -m option % bsub -m zephyr ... Where is the output from this job? By default, LSF will send you email containing the standard output (stdout) and standard error (stderr) from your job, as well as some basic information about the execution of the job. If your program produces additional output files, they are separate and are not included in this email. Job submitted to the batch queue run in the background, but sometimes you need to run a job in interactively in the foreground. To do this you need to request the interactive queue explicitly and also use the -Ip (interactive pseudo terminal) option. Here is a simple example: bsub -m chastity -q interactive -Ip uname -a Job <125682> is submitted to queue <interactive>. <<Waiting for dispatch ...>> <<Starting on chastity>> IRIX64 chastity 6.5 01091820 IP35 Note: When you run an interactive job, LSF does not send you email. Examples of programs that are often run in the interactive queue include SAS, Stata, and Mathematica.
More bsub optionsHere are some more bsub options that you may find useful. -bTo force your job to begin at a specific time, use the -b option on the bsub command: bsub -b 11:00 Tells LSF to start your job at 11:00 a.m. If the current time is after 11:00 a.m., the job will be held until the next day. bsub -b 2:15:23:15 Tells LSF to start the job at 11:15 p.m. on February 15. -o, -eTo save your job's output in a file instead of receiving it in email, use the -o option on the bsub command: bsub -o my_output ... You can put stdout and stderr in different files if you wish: bsub -o my_out -e my_err ... To make it easier to keep track of the output from multiple runs of the same program, you can use the special %J variable in your file names. LSF will substitute the job number for the %J variable: bsub -o out.%J -uIf LSF sends you email with your job output, it uses the address bsub -u job_user@unc.edu ...
Submitting jobs with bsas and other scriptsSome programs are used so frequently that ATN has created special "wrapper" scripts you can use for submitting those program to LSF. These scripts construct the bsub command and submit the job for you. More information about these wrapper scripts can be found here.
Monitoring jobs with bjobs and bhist
bjobsThe bjobs command displays the current status of one or more jobs. If used without any options, if displays all of your own pending, running or suspended jobs. % bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 123456 jobuser RUN batch chastity zephyr myprog01 Mar 25 14:13 Useful options for the bjobs command include:
Note that you can use more than one option at a time. For example: bjobs -l -w -a -u joeuser
bhistThe bhist command displays historical information about jobs.
bpeekThe bpeek command display the stdout and stderr of a job while it is running.
Controlling jobs with bkill, bstop and bresume
bkillThe bkill command is usually used to kill a running, pending or suspended job. More precisely, bkill causes LSF to send SIGINT and SIGTERM to a job to give it a change to clean up, then LSF sends SIGKILL to kill the job. You can only kill your own jobs.
bstopThe bstop command suspends a job by sending it the SIGSTOP signal.
bresumeThe bresume command resumes a suspended job by sending it the SIGCONT signal.
Last updated Thursday, May 23, 2002 04:35 PM
|
|
|