Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Job Arrays


LSF provides a structure called a job array that allows a sequence of jobs that share the same executable and resource requirements, but have different input files, to be submitted, controlled, and monitored as a single unit. Using the standard LSF commands, you can also control and monitor individual jobs and groups of jobs submitted from a job array.

After the job array is submitted, LSF independently schedules and dispatches the individual jobs. Each job submitted from a job array shares the same job ID as the job array and are uniquely referenced using an array index. The dimension and structure of a job array is defined when the job array is created.

Contents

[ Top ]


Creating a Job Array

A job array is created at job submission time using the -J option of bsub. For example, the following command creates a job array named myArray made up of 1000 jobs.

% bsub -J "myArray[1-1000]" myJob
Job <123> is submitted to default queue <normal>.

Syntax

The bsub syntax used to create a job array follows:

% bsub -J "arrayName[indexList, ...]" myJob

Where:

-J "arrayName[indexList, ...]"

Names and creates the job array. The square brackets, [ ], around indexList must be entered exactly as shown and the job array name specification must be enclosed in quotes. Commas (,) are used to separate multiple indexList entries. The maximum length of this specification is 255 characters.

arrayName

User specified string used to identify the job array. Valid values are any combination of the following characters:

a-z | A-Z | 0-9 | . | - | _

indexList = start[-end[:step]]

Specifies the size and dimension of the job array, where:

After the job array is created (submitted), individual jobs are referenced using the job array name or job ID and an index value. For example, both of the following series of job array statements refer to jobs submitted from a job array named myArray which is made up of 1000 jobs and has a job ID of 123:

myArray[1], myArray[2], myArray[3], ..., myArray[1000]
123[1], 123[2], 123[3], ..., 123[1000]

Maximum size of a job array

A large job array allows a user to submit a large number of jobs to the system with a single job submission.

By default, the maximum number of jobs in a job array is 1000, which means the maximum size of a job array can never exceed 1000 jobs.

To make a change to the maximum job array value, set MAX_JOB_ARRAY_SIZE in lsb.params to any number up to 65534. The maximum number of jobs in a job array cannot exceed this value.

[ Top ]


Handling Input and Output Files

LSF provides methods for coordinating individual input and output files for the multiple jobs created when submitting a job array. These methods require your input files to be prepared uniformly. To accommodate an executable that uses standard input and standard output, LSF provides runtime variables (%I and %J) that are expanded at runtime. To accommodate an executable that reads command line arguments, LSF provides an environment variable (LSB_JOBINDEX) that is set in the execution environment.

Methods

Preparing input files

LSF needs all the input files for the jobs in your job array to be located in the same directory. By default LSF assumes the current working directory (CWD); the directory from where bsub was issued. To override CWD, specify an absolute path when submitting the job array.

Each file name consists of two parts, a consistent name string and a variable integer that corresponds directly to an array index. For example, the following file names are valid input file names for a job array. They are made up of the consistent name input and integers that correspond to job array indices from 1 to 1000:

input.1, input.2, input.3, ..., input.1000

[ Top ]


Redirecting Standard Input and Output

The variables %I and %J are used as substitution strings to support file redirection for jobs submitted from a job array. At execution time, %I is expanded to provide the job array index value of the current job, and %J is expanded at to provide the job ID of the job array.

Standard input

Use the -i option of bsub and the %I variable when your executable reads from standard input. To use %I, all the input files must be named consistently with a variable part that corresponds to the indices of the job array. For example:

input.1, input.2, input.3, ..., input.N

For example, the following command submits a job array of 1000 jobs whose input files are named input.1, input.2, input.3, ..., input.1000 and located in the current working directory:

% bsub -J "myArray[1-1000]" -i "input.%I" myJob

Standard output and error

Use the -o option of bsub and the %I and %J variables when your executable writes to standard output and error.

To create an output file that corresponds to each job submitted from a job array, specify %I as part of the output file name. For example, the following command submits a job array of 1000 jobs whose output files will be located in CWD and named output.1, output.2, output.3, ..., output.1000:

% bsub -J "myArray[1-1000]" -o "output.%I" myJob

To create output files that include the job array job ID as part of the file name specify %J. For example, the following command submits a job array of 1000 jobs whose output files will be located in CWD and named output.123.1, output.123.2, output.123.3, ..., output.123.1000. The job ID of the job array is 123.

% bsub -J "myArray[1-1000]" -o "output.%J.%I" myJob

[ Top ]


Passing Arguments on the Command Line

The environment variable LSB_JOBINDEX is used as a substitution string to support passing job array indices on the command line. When the job is dispatched, LSF sets LSB_JOBINDEX in the execution environment to the job array index of the current job. LSB_JOBINDEX is set for all jobs. For non-array jobs, LSB_JOBINDEX is set to sero (0).

To use LSB_JOBINDEX, all the input files must be named consistently and with a variable part that corresponds to the indices of the job array. For example:

input.1, input.2, input.3, ..., input.N

You must escape LSB_JOBINDEX with a backslash, \, to prevent the shell interpreting bsub from expanding the variable. For example, the following command submits a job array of 1000 jobs whose input files are named input.1, input.2, input.3, ..., input.1000 and located in the current working directory. The executable is being passed an argument that specifies the name of the input files:

% bsub -J "myArray[1-1000]" myJob -f input.\$LSB_JOBINDEX

[ Top ]


Job Array Dependencies

Like all jobs in LSF, a job array can be dependent on the completion or partial completion of a job or another job array. A number of job-array-specific dependency conditions are provided by LSF.

Whole array dependency

To make a job array dependent on the completion of a job or another job array use the -w "dependency_condition" option of bsub. For example, to have an array dependent on the completion of a job or job array with job ID 123, you would use the following command:

% bsub -w "done(123)" -J "myArray2[1-1000]" myJob

Partial array dependency

To make a job or job array dependent on an existing job array you would use one of the following dependency conditions.
Condition Description
numrun(jobArrayJobId, op num)
Evaluate the number of jobs in RUN state
numpend(jobArrayJobId, op num)
Evaluate the number of jobs in PEND state
numdone(jobArrayJobId, op num)
Evaluate the number of jobs in DONE state
numexit(jobArrayJobId, op num)
Evaluate the number of jobs in EXIT state
numended(jobArrayJobId, op num)
Evaluate the number of jobs in DONE and EXIT state
numhold(jobArrayJobId, op num)
Evaluate the number of jobs in PSUSP state
numstart(jobArrayJobId, op num)
Evaluate the number of jobs in RUN and SSUSP and USUSP state

Use one the following operators (op) combined with a positive integer (num) to build a condition:

== | > | < | >= |<= | !=

Optionally, an asterisk (*) can be used in place of num to mean all jobs submitted from the job array.

For example, to start a job named myJob when 100 or more elements in a job array with job ID 123 have completed successfully:

% bsub -w "numdone(123, >= 100)" myJob

[ Top ]


Monitoring Job Arrays

Use bjobs and bhist to monitor the current and past status of job arrays.

Job array status

To display summary information about the currently running jobs submitted from a job array, use the -A option of bjobs. For example, a job array of 10 jobs with job ID 123:

% bjobs -A 123
JOBID    ARRAY_SPEC  OWNER  NJOBS PEND DONE  RUN EXIT SSUSP USUSP PSUSP
123      myArra[1-10]     user1     10    3    3    4    0     0     0     0

Individual job status

Current

To display the status of the individual jobs submitted from a job array, specify the job array job ID with bjobs. For jobs submitted from a job array, JOBID displays the job array job ID, and JOBNAME displays the job array name and the index value of each job. For example, to view a job array with job ID 123:

% bjobs 123
JOBID  USER   STAT   QUEUE     FROM_HOST  EXEC_HOST   JOB_NAME    SUBMIT_TIME
123    user1  DONE   default   hostA      hostC       myArray[1]  Feb 29 12:34
123    user1  DONE   default   hostA      hostQ       myArray[2]  Feb 29 12:34
123    user1  DONE   default   hostA      hostB       myArray[3]  Feb 29 12:34
123    user1  RUN    default   hostA      hostC       myArray[4]  Feb 29 12:34
123    user1  RUN    default   hostA      hostL       myArray[5]  Feb 29 12:34
123    user1  RUN    default   hostA      hostB       myArray[6]  Feb 29 12:34
123    user1  RUN    default   hostA      hostQ       myArray[7]  Feb 29 12:34
123    user1  PEND   default   hostA                  myArray[8]  Feb 29 12:34
123    user1  PEND   default   hostA                  myArray[9]  Feb 29 12:34
123    user1  PEND   default   hostA                  myArray[10] Feb 29 12:34

Past

To display the past status of the individual jobs submitted from a job array, specify the job array job ID with bhist. For example, to view the history of a job array with job ID 456:

% bhist 456
Summary of time in seconds spent in various states:
JOBID  USER    JOB_NAME   PEND    PSUSP   RUN     USUSP   SSUSP   UNKWN   TOTAL
456[1] user1   *rray[1]   14      0       65      0       0       0       79
456[2] user1   *rray[2]   74      0       25      0       0       0       99
456[3] user1   *rray[3]   121     0       26      0       0       0       147
456[4] user1   *rray[4]   167     0       30      0       0       0       197
456[5] user1   *rray[5]   214     0       29      0       0       0       243
456[6] user1   *rray[6]   250     0       35      0       0       0       285
456[7] user1   *rray[7]   295     0       33      0       0       0       328
456[8] user1   *rray[8]   339     0       29      0       0       0       368
456[9] user1   *rray[9]   356     0       26      0       0       0       382
456[10]user1   *ray[10]   375     0       24      0       0       0       399

Specific job status

Current

To display the current status of a specific job submitted from a job array, specify in quotes, the job array job ID and an index value with bjobs. For example, the status of the 5th job in a job array with job ID 123:

% bjobs "123[5]"
JOBID  USER   STAT   QUEUE     FROM_HOST  EXEC_HOST   JOB_NAME    SUBMIT_TIME
123    user1  RUN    default   hostA      hostL       myArray[5]  Feb 29 12:34

Past

To display the past status of a specific job submitted from a job array, specify, in quotes, the job array job ID and an index value with bhist. For example, the status of the 5th job in a job array with job ID 456:

% bhist "456[5]"
Summary of time in seconds spent in various states:
JOBID  USER    JOB_NAME   PEND    PSUSP   RUN     USUSP   SSUSP   UNKWN   TOTAL
456[5] user1   *rray[5]   214     0       29      0       0       0       243

[ Top ]


Controlling Job Arrays

You can control the whole array, all the jobs submitted from the job array, with a single command. LSF also provides the ability to control individual jobs and groups of jobs submitted from a job array. When issuing commands against a job array, use the job array job ID instead of the job array name. Job names are not unique in LSF, and issuing a command using a job array name may result in unpredictable behavior.

Most LSF commands allow operation on both the whole job array, individual jobs, and groups of jobs. These commands include bkill, bstop, bresume, and bmod.

Some commands only allow operation on individual jobs submitted from a job array. These commands include btop, bbot, and bswitch.

Whole array

To control the whole job array, specify the command as you would for a single job using only the job ID. For example, to kill a job array with job ID 123:

% bkill 123

Individual jobs

To control an individual job submitted from a job array, specify the command using the job ID of the job array and the index value of the corresponding job. The job ID and index value must be enclosed in quotes. For example, to kill the 5th job in a job array with job ID 123:

% bkill "123[5]"

Groups of jobs

To control a group of jobs submitted from a job array, specify the command as you would for an individual job and use indexList syntax to indicate the jobs. For example, to kill jobs 1-5, 239, and 487 in a job array with job ID 123:

% bkill "123[1-5, 239, 487]"

[ Top ]


Requeuing a Job Array

Use brequeue to requeue a job array. When the job is requeued, it is assigned the PEND status and the job's new position in the queue is after other jobs of the same priority. You can requeue:

brequeue is not supported across clusters.

Requeuing jobs in DONE state

To requeue DONE jobs use the -d option of brequeue. For example, the command brequeue -J "myarray[1-10]" -d 123 requeues jobs with job ID 123 and DONE status.

Requeuing Jobs in EXIT state

To requeue EXIT jobs use the -e option of brequeue. For example, the command brequeue -J "myarray[1-10]" -e 123 requeues jobs with job ID 123 and EXIT status.

Requeuing all jobs in an array regardless of job state

A submitted job array can have jobs that have different job states. To requeue all the jobs in an array regardless of any job's state, use the -a option of brequeue. For example, the command brequeue -J "myarray[1-10]" -a 123 requeues all jobs in a job array with job ID 123 regardless of their job state.

Requeuing RUN jobs to PSUSP state

To requeue RUN jobs to PSUSP state, use the -H option of brequeue. For example, the command brequeue -J "myarray[1-10]" -H 123 requeues to PSUSP RUN status jobs with job ID 123.

Requeuing jobs in RUN state

To requeue RUN jobs use the -r option of brequeue. For example, the command brequeue -J "myarray[1-10]" -r 123 requeues jobs with job ID 123 and RUN status.

[ Top ]


Job Array Job Slot Limit

The job array job slot limit is used to specify the maximum number of jobs submitted from a job array that are allowed to run at any one time. A job array allows a large number of jobs to be submitted with one command, potentially flooding a system, and job slot limits provide a way to limit the impact a job array may have on a system. Job array job slot limits are specified using the following syntax:

% bsub -J "arrayName[indexList]%jobLimit" myJob

where:

%jobLimit

Specifies the maximum number of jobs allowed to run at any one time. The percent sign, %, must be entered exactly as shown. Valid values are positive integers less than the maximum index value of the job array.

Setting a job array job slot limit

A job array job slot limit can be set at the time of submission using bsub, or after submission using bmod.

At Submission

For example, to set a job array job slot limit of 100 jobs for a job array of 1000 jobs:

% bsub -J "jobArrayName[1000]%100" myJob

After submission

For example, to set a job array job slot limit of 100 jobs for 
an array with job ID 123:
% bmod -J "%100" 123

Changing a job array job slot limit

Changing a job array job slot limit is the same as setting it after submission. For example, to change a job array job slot limit to 250 for a job array with job ID 123:

% bmod -J "%250" 123

Viewing a job array job slot limit

To view job array job slot limits use the -A and -l options of bjobs. The job array job slot limit is displayed in the Job Name field in the same format in which it was set. For example, the following output displays the job array job slot limit of 100 for a job array with job ID 123:

% bjobs -A -l 123
Job <123>, Job Name <myArray[1-1000]%100>, User <user1>, Project <default>, Sta
                     tus <PEND>, Queue <normal>, Job Priority <20>, Command <my
                     Job>
Wed Feb 29 12:34:56: Submitted from host <hostA>, CWD <$HOME>;
 
 COUNTERS:
 NJOBS PEND DONE RUN EXIT SSUSP USUSP PSUSP
    10    9   0    1    0     0     0     0

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 12, 2004
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.