Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Specifying Resource Requirements


Contents

[ Top ]


About Resource Requirements

Resource requirements define which hosts a job can run on. Each job has its resource requirements. Hosts that match the resource requirements are the candidate hosts. When LSF schedules a job, it uses the load index values of all the candidate hosts. The load values for each host are compared to the scheduling conditions. Jobs are only dispatched to a host if all load values are within the scheduling thresholds.

By default, if a job has no resource requirements, LSF places it on a host of the same type as the submission host (i.e., type==any). However, if a job has string or Boolean resource requirements specified and the host type has not been specified, LSF places the job on any host (i.e., type==any) that satisfies the resource requirements.

To override the LSF defaults, specify resource requirements explicitly. Resource requirements can be set for queues, for individual applications, or for individual jobs.

To best place a job with optimized performance, resource requirements can be specified for each application. This way, you do not have to specify resource requirements every time you submit a job. The LSF administrator may have already configured the resource requirements for your jobs, or you can put your executable name together with its resource requirements into your personal remote task list.

The bsub command automatically uses the resource requirements of the job from the remote task lists.

A resource requirement is an expression that contains resource names and operators.

[ Top ]


Queue-Level Resource Requirements

Each queue can define resource requirements that will be applied to all the jobs in the queue.

When resource requirements are specified for a queue, and no job-level resource requirement is specified, the queue-level resource requirements become the default resource requirements for the job.

Syntax

The condition for dispatching a job to a host can be specified through the queue-level RES_REQ parameter in the queue definition in lsb.queues.

Examples

RES_REQ=select[((type==ALPHA && r1m < 2.0)||(type==HPPA && r1m < 1.0))]

This will allow a queue, which contains ALPHA and HPPA hosts, to have different thresholds for different types of hosts.

RES_REQ=select[((hname==hostA && mem > 50)||(hname==hostB && mem > 100))]

Using the hname resource in the resource requirement string allows you to set up different conditions for different hosts in the same queue.

Load thresholds

Load thresholds can be configured by your LSF administrator to schedule jobs in queues. Load thresholds specify a load index value. There are two types of load thresholds:

loadSched

The scheduling threshold which determines the load condition for dispatching pending jobs. If a host's load is beyond any defined loadSched, a job will not be started on the host. This threshold is also used as the condition for resuming suspended jobs.

loadStop

The suspending condition that determines when running jobs should be suspended.

Thresholds can be configured for each queue, for each host, or a combination of both. To schedule a job on a host, the load levels on that host must satisfy both the thresholds configured for that host and the thresholds for the queue from which the job is being dispatched.

The value of a load index may either increase or decrease with load, depending on the meaning of the specific load index. Therefore, when comparing the host load conditions with the threshold values, you need to use either greater than (>) or less than (<), depending on the load index.

See Load Thresholds for information about suspending conditions and configuring load thresholds.

Viewing queue-level resource requirements

Use bqueues -l to view resource requirements (RES_REQ) defined for the queue:

% bqueues -l normal

QUEUE: normal
  -- No description provided.  This is the default queue.
...
RES_REQ:  select[type==any] rusage[mem=10,dynamic_rsrc=10:duration=2:decay=1]
...

[ Top ]


Job-Level Resource Requirements

Each job can specify resource requirements. Job-level resource requirements override any resource requirements specified in the remote task list.

In some cases, the queue specification sets an upper or lower bound on a resource. If you attempt to exceed that bound, your job will be rejected.

Syntax

To specify resource requirements for your job, use bsub -R and specify the resource requirement string as usual.

Example

% bsub -R "swp > 15 && hpux order[cpu]" myjob

This runs myjob on an HP-UX host that is lightly loaded (CPU utilization) and has at least 15 MB of swap memory available.

Viewing job-level resource requirements

Use bjobs -l to view resource requirements defined for the job:

% bsub -R type==any -q normal myjob
Job <2533> is submitted to queue <normal>.
% bjobs -l 2533
Job <2533>, User <user1>, Project <default>, Status <DONE>, 
Queue <normal>,
	 	      Command <myjob>
Fri May 10 17:21:26: Submitted from host <hostA>, CWD <$HOME>, 
Requested Resources <type==any>;
Fri May 10 17:21:31: Started on <hostB>, Execution Home 
</home/user1>,Execution CWD </home/user1>;
Fri May 10 17:21:47: Done successfully. The CPU time used is 
0.3 seconds.
...

After a job is finished, use bhist -l to view resource requirements defined for the job:

% bhist -l 2533

Job <2533>, User <user1>, Project <default>, Command <myjob>
Fri May 10 17:21:26: Submitted from host <hostA>, to Queue 
<normal>, CWD
	 	      <$HOME>, Requested Resources <type==any>;
Fri May 10 17:21:31: Dispatched to <hostB>;
Fri May 10 17:21:32: Starting (Pid 1850232);
Fri May 10 17:21:33: Running with execution home 
</home/user1>, Execution
	 	      CWD </home/user1>, Execution Pid <1850232>;
Fri May 10 17:21:45: Done successfully. The CPU time used is 
0.3 seconds;
...

[ Top ]


About Resource Requirement Strings

Most LSF commands accept a -R res_req argument to specify resource requirements. The exact behaviour depends on the command. For example, specifying a resource requirement for the lsload command displays the load levels for all hosts that have the requested resources.

Specifying resource requirements for the lsrun command causes LSF to select the best host out of the set of hosts that have the requested resources.

A resource requirement string describes the resources a job needs. LSF uses resource requirements to select hosts for remote execution and job execution.

Resource requirement string sections

Which sections apply

Depending on the command, one or more of these sections may apply. For example:

Syntax

select[selection_string] order[order_string] 
rusage[usage_string [, usage_string] ...] span[span_string] 
same[same_string]

The square brackets must be typed as shown.

The section names are select, order, rusage, span, and same. Sections that do not apply for a command are ignored.

If no section name is given, then the entire string is treated as a selection string. The select keyword may be omitted if the selection string is the first string in the resource requirement.

Each section has a different syntax.

How queue-level and job-level requirements are resolved

If job-level resource requirements are specified together with queue-level resource requirements:

[ Top ]


Selection String

The selection string specifies the characteristics a host must have to match the resource requirement. It is a logical expression built from a set of resource names. The selection string is evaluated for each host; if the result is non-zero, then that host is selected.

Syntax

The selection string can combine resource names with logical and arithmetic operators. Non-zero arithmetic values are treated as logical TRUE, and zero (0) as logical FALSE. Boolean resources (for example, server to denote LSF server hosts) have a value of one (1) if they are defined for a host, and zero (0) if they are not defined for the host.

The resource names swap, idle, login, and cpu are accepted as aliases for swp, it, ls, and r1m respectively.

For ut, specify the percentage CPU utilization as an integer between 0-100.

For the string resources type and model, the special value any selects any value and local selects the same value as that of the local host. For example, type==local selects hosts of the same type as the host submitting the job. If a job can run on any type of host, include type==any in the resource requirements.

If no type is specified, the default depends on the command. For bsub, lsplace, lsrun, and lsgrun the default is type==local unless a string or Boolean resource is specified, in which case it is type==any. For lshosts, lsload, lsmon and lslogin the default is type==any.

Selecting shared string resources

You must use single quote characters (') around string-type shared resources. For example, use lsload -s to see the shared resources defined for the cluster:

$ lsload -s
RESOURCE                                VALUE       LOCATION
os_version                                4.2       pc36
os_version                                4.0       pc34
os_version                                4.1       devlinux4
cpu_type                                   ia       pc36
cpu_type                                   ia       pc34
cpu_type                              unknown       devlinux4

Use a select string in lsload -R to specify the shared resources you want to view, enclosing the shared resource values in single quotes. For example:

$ lsload -R "select[os_version=='4.2' || cpu_type=='unknown']" 
HOST_NAME       status  r15s   r1m  r15m   ut    pg  ls    it   tmp   swp   mem
pc36                ok   0.0   0.2   0.1   1%   3.4   3     0  895M  517M  123M
devlinux4           ok   0.0   0.1   0.0   0%   2.8   4     0 6348M  504M  205M

Operators

These operators can be used in selection strings. The operators are listed in order of decreasing precedence.

Syntax Meaning
-a
!a
Negative of a
Logical not: 1 if a==0, 0 otherwise
a * b
a / b
Multiply a and b
Divide a by b
a + b
a - b
Add a and b
Subtract b from a
a > b
a < b
a >= b
a <= b
1 if a is greater than b, 0 otherwise
1 if a is less than b, 0 otherwise
1 if a is greater than or equal to b, 0 otherwise
1 if a is less than or equal to b, 0 otherwise
a == b
a != b
1 if a is equal to b, 0 otherwise
1 if a is not equal to b, 0 otherwise
a && b
Logical AND: 1 if both a and b are non-zero, 0 otherwise
a || b
Logical OR: 1 if either a or b is non-zero, 0 otherwise

Examples

select[(swp > 50 && type == MIPS) || (swp > 35 && type == 
ALPHA)]
select[((2*r15s + 3*r1m + r15m) / 6 < 1.0) && !fs && (cpuf > 
4.0)]

Specifying shared resources with the keyword "defined"

A shared resource may be used in the resource requirement string of any LSF command. For example when submitting an LSF job which requires a certain amount of shared scratch space, you might submit the job as follows:

% bsub -R "avail_scratch > 200 && swap > 50" myjob

The above assumes that all hosts in the cluster have access to the shared scratch space. The job will only be scheduled if the value of the "avail_scratch" resource is more than 200 MB and will go to a host with at least 50 MB of available swap space.

It is possible for a system to be configured so that only some hosts within the LSF cluster have access to the scratch space. In order to exclude hosts which cannot access a shared resource, the defined(resource_name) function must be specified in the resource requirement string.

For example:

% bsub -R "defined(avail_scratch) && avail_scratch > 100 && 
swap > 100" myjob

would exclude any hosts which cannot access the scratch resource. The LSF administrator configures which hosts do and do not have access to a particular shared resource.

[ Top ]


Order String

The order string allows the selected hosts to be sorted according to the values of resources. The values of r15s, r1m, and r15m used for sorting are the normalized load indices returned by lsload -N.

The order string is used for host sorting and selection. The ordering begins with the rightmost index in the order string and proceeds from right to left. The hosts are sorted into order based on each load index, and if more hosts are available than were requested, the LIM drops the least desirable hosts according to that index. The remaining hosts are then sorted by the next index.

After the hosts are sorted by the leftmost index in the order string, the final phase of sorting orders the hosts according to their status, with hosts that are currently not available for load sharing (that is, not in the ok state) listed at the end.

Because the hosts are sorted again for each load index, only the host status and the leftmost index in the order string actually affect the order in which hosts are listed. The other indices are only used to drop undesirable hosts from the list.

When sorting is done on each index, the direction in which the hosts are sorted (increasing vs. decreasing values) is determined by the default order returned by lsinfo for that index. This direction is chosen such that after sorting, by default, the hosts are ordered from best to worst on that index.

Syntax

[-]resource_name [:[-]resource_name]...

You can specify any built-in or external load index.

When an index name is preceded by a minus sign `-', the sorting order is reversed so that hosts are ordered from worst to best on that index.

Default

The default sorting order is r15s:pg (except for lslogin(1): ls:r1m).

Example

swp:r1m:tmp:r15s

[ Top ]


Usage String

This string defines the expected resource usage of the job. It is used to specify resource reservations for jobs, or for mapping jobs on to hosts and adjusting the load when running interactive jobs.

By default, no resources are reserved.

Batch jobs

The resource usage (rusage) section can be specified at the job level or with the queue configuration parameter RES_REQ.

Syntax

rusage[usage_string [, usage_string] ...]

where usage_string is:

load_index=value [:load_index=value]... [:duration=minutes[m] | 
:duration=hoursh | :duration=secondss [:decay=0 | :decay=1]]

Load index

Internal and external load indices are considered in the resource usage string. The resource value represents the initial reserved amount of the resource.

Duration

The duration is the time period within which the specified resources should be reserved. Specify a duration equal to or greater than the ELIM updating interval.


Duration is not supported for static shared resources. If the shared resource is defined in an lsb.resources Limit section, then duration is not applied.

Decay

The decay value indicates how the reserved amount should decrease over the duration.

Values other than 0 or 1 are unsupported. If duration is not specified, decay value is ignored.


Decay is not supported for static shared resources. If the shared resource is defined in an lsb.resources Limit section, then decay is not applied.

Default

If a resource or its value is not specified, the default is not to reserve that resource. If duration is not specified, the default is to reserve the total amount for the lifetime of the job. The default decay value is 0.

Example

rusage[mem=50:duration=100:decay=1]

This example indicates that 50 MB memory should be reserved for the job. As the job runs, the amount reserved will decrease at approximately 0.5 MB per minute until the 100 minutes is up.

How queue-level and job-level rusage sections are resolved

Job-level rusage overrides the queue-level specification:

How queue-level and job-level rusage sections are merged

When both job-level and queue-level rusage sections are defined, the rusage section defined for the job overrides the rusage section defined in the queue. The two rusage definitions are merged, with the job-level rusage taking precedence. For example:

Specifying multiple usage strings

Use several comma-separated usage strings to define different duration and decay for any number of resources.

A given load index cannot appear more than once in the resource usage string.

Examples

Non-batch environments

Resource reservation is only available for batch jobs. If you run jobs using only LSF Base, such as through lsrun, LIM uses resource usage to determine the placement of jobs. Resource usage requests are used to temporarily increase the load so that a host is not overloaded. When LIM makes a placement advice, external load indices are not considered in the resource usage string. In this case, the syntax of the resource usage string is

res[=value]:res[=value]: ... :res[=value]

res is one of the resources whose value is returned by the lsload command.

rusage[r1m=0.5:mem=20:swp=40]

The above example indicates that the task is expected to increase the 1-minute run queue length by 0.5, consume 20 MB of memory and 40 MB of swap space.

If no value is specified, the task is assumed to be intensive in using that resource. In this case no more than one task will be assigned to a host regardless of how many CPUs it has.

The default resource usage for a task is r15s=1.0:r1m=1.0:r15m=1.0. This indicates a CPU-intensive task which consumes few other resources.

[ Top ]


Span String

A span string specifies the locality of a parallel job. If span is omitted, LSF allocates the required processors for the job from the available set of processors.

Syntax

Two kinds of span string are supported:

See Controlling Processor Allocation Across Hosts for more information about specifying span strings.

[ Top ]


Same String


You must have the parallel batch job scheduler plugin installed in order to use the same string.

Parallel jobs run on multiple hosts. If your cluster has heterogeneous hosts, some processes from a parallel job may for example, run on Solaris and some on SGI IRIX. However, for performance reasons you may want all processes of a job to run on the same type of host instead of having some processes run on one type of host and others on another type of host.

The same string specifies that all processes of a parallel job must run on hosts with the same resource.

You can specify the same string:

When both queue-level and job-level same sections are defined, LSF combines both requirements to allocate processors.

Syntax

resource_name[:resource_name]...

You can specify any static resource.

When you specify for example, resource1:resource2, if hosts always have both resources, the string is interpreted as:

If hosts do not always have both resources, it is interpreted as:

Examples

% bsub -n 4 -R"select[type==SGI6 || type==SOL7] same[type]" 
myjob

Run all parallel processes on the same host type. Allocate 4 processors on the same host type--either SGI IRIX, or Solaris 7, but not both.

% bsub -n 6 -R"select[type==any] same[type:model]" myjob

Run all parallel processes on the same host type and model. Allocate 6 processors on any host type or model as long as all the processors are on the same host type and model.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 12, 2004
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.