Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Resource Allocation Limits


Contents

[ Top ]


About Resource Allocation Limits

Contents

What resource allocation limits do

By default, resource consumers like users, hosts, queues, or projects are not limited in the resources available to them for running jobs. Resource allocation limits configured in lsb.resources restrict:

If all of the resource has been consumed, no more jobs can be started until some of the resource is released.

For example, by limiting maximum amount of memory for each of your hosts, you can make sure that your system operates at optimal performance. By defining a memory limit for some users submitting jobs to a particular queue and a specified set of hosts, you can prevent these users from using up all the memory in the system at one time.

Jobs must specify resource requirements

For limits to apply, the job must specify resource requirements (bsub -R rusage string or RES_REQ in lsb.queues). For example, the a memory allocation limit of 4 MB is configured in lsb.resources:

Begin Limit
NAME = mem_limit1
MEM = 4
End Limit

A is job submitted with an rusage resource requirement that exceeds this limit:

% bsub -R"rusage[mem=5]" uname

and remains pending:

% bjobs -p 600
  JOBID  USER   STAT  QUEUE   FROM_HOST  EXEC_HOST  JOB_NAME        SUBMIT_TIME
  600    user1  PEND  normal   suplin02                uname       Aug 12 14:05
Resource (mem) limit defined cluster-wide has been reached;

A job is submitted with a resource requirement within the configured limit:

% bsub -R"rusage[mem=3]" sleep 100

is allowed to run:

% bjobs
  JOBID   USER   STAT  QUEUE   FROM_HOST  EXEC_HOST  JOB_NAME      SUBMIT_TIME
  600    user1   PEND  normal      hostA                uname      Aug 12 14:05
  604    user1    RUN  normal      hostA            sleep 100      Aug 12 14:09

Resource allocation limits and resource usage limits

Resource allocation limits are not the same as resource usage limits, which are enforced during job run time. For example, you set CPU limits, memory limits, and other limits that take effect after a job starts running. See Runtime Resource Usage Limits for more information.

How LSF enforces limits

Resource allocation limits are enforced so that they apply to:

How LSF counts resources

Resources on a host are not available if they are taken by jobs that have been started, but have not yet finished. This means running and suspended jobs count against the limits for queues, users, hosts, projects, and processors that they are associated with.

Job slot limits

Job slot limits often correspond to the maximum number of jobs that can run at any point in time. For example, a queue cannot start jobs if it has no job slots available, and jobs cannot run on hosts that have no available job slots.

Resource reservation and backfill

When processor or memory reservation occurs, the reserved resources count against the limits for users, queues, hosts, projects, and processors. When backfilling of parallel jobs occurs, the backfill jobs do not count against any limits.

MultiCluster

Limits apply only to the cluster where lsb.resources is configured. If the cluster leases hosts from another cluster, limits are enforced on those hosts as if they were local hosts.

Limits for resource consumers

Host groups

If a limit is specified for a host group, the total amount of a resource used by all hosts in that group is counted. If a host is a member of more than one group, each job running on that host is counted against the limit for all groups to which the host belongs.

Limits for users and user groups

Jobs are normally queued on a first-come, first-served (FCFS) basis. It is possible for some users to abuse the system by submitting a large number of jobs; jobs from other users must wait until these jobs complete. Limiting resources by user prevents users from monopolizing all the resources.

Users can submit an unlimited number of jobs, but if they have reached their limit for any resource, the rest of their jobs stay pending, until some of their running jobs finish or resources become available.

If a limit is specified for a user group, the total amount of a resource used by all users in that group is counted. If a user is a member of more than one group, each of that user's jobs is counted against the limit for all groups to which that user belongs.

Use the keyword all to configure limits that apply to each user or user group in a cluster. This is useful if you have a large cluster but only want to exclude a few users from the limit definition.

Per-user limits on users and groups

Per-user limits are enforced on each user or individually to each user in the user group listed. If a user group contains a subgroup, the limit also applies to each member in the subgroup recursively.

Per-user limits that use the keywords all apply to each user in a cluster. If user groups are configured, the limit applies to each member of the user group, not the group as a whole.

[ Top ]


Configuring Resource Allocation Limits

Contents

lsb.resources file

Configure all resource allocation limits in one or more Limit sections in the lsb.resources file. Limit sections set limits for how much of the specified resources must be available for different classes of jobs to start, and which resource consumers the limits apply to.

Resource parameters

To limit... Set in a Limit section of lsb.resources...
Total number of job slots that can be used by specific jobs
SLOTS
Jobs slots based on the number of processors on each host affected by the limit
SLOTS_PER_PROCESSOR and PER_HOST
Memory--if PER_HOST is set for the limit, the amount can be a percentage of memory on each host in the limit
MEM (MB or percentage)
Swap space--if PER_HOST is set for the limit, the amount can be a percentage of swap space on each host in the limit
SWP (MB or percentage)
Tmp space--if PER_HOST is set for the limit, the amount can be a percentage of tmp space on each host in the limit
TMP (MB or percentage)
Software licenses
LICENSE or RESOURCE
Any shared resource
RESOURCE

Consumer parameters

For jobs submitted... Set in a Limit section of lsb.resources...
By all specified users or user groups
USERS
To all specified queues
QUEUES
To all specified hosts or host groups
HOSTS
For all specified projects
PROJECTS
By each specified user or each member of the specified user groups
PER_USER
To each specified queue
PER_QUEUE
To each specified host or each member of the specified host groups
PER_HOST
For each specified project
PER_PROJECT

Enabling resource allocation limits

Resource allocation limits scheduling plugin

To enable resource allocation limits in your cluster, configure the resource allocation limits scheduling plugin schmod_limit in lsb.modules.

Configuring lsb.modules

Begin PluginModule
SCH_PLUGIN               RB_PLUGIN                
SCH_DISABLE_PHASES
schmod_default              ()                              ()
schmod_limit                ()                              ()
End PluginModule

Configuring cluster-wide limits

To configure limits that take effect for your entire cluster, configure limits in lsb.resources, but do not specify any consumers.

Compatibility with pre-version 6.0 job slot limits

The Limit section of lsb.resources does not support the keywords or format used in lsb.users, lsb.hosts, and lsb.queues. However, any existing job slot limit configuration in these files will continue to apply.

How resource allocation limits map to pre-version 6.0 job slot limits

Job slot limits are the only type of limit you can configure in lsb.users, lsb.hosts, and lsb.queues. You cannot configure limits for user groups, host groups, and projects in lsb.users, lsb.hosts, and lsb.queues. You should not configure any new resource allocation limits in lsb.users, lsb.hosts, and lsb.queues. Use lsb.resources to configure all new resource allocation limits, including job slot limits.

Job slot resources Resource consumers (lsb.resources) Equivalent existing limit (file)
(lsb.resources) USERS PER_USER QUEUES HOSTS PER_HOST
SLOTS
--
all
--
host_name
--
JL/U (lsb.hosts)
SLOTS_PER_PROCESSOR
user_name
--
--
--
all
JL/P (lsb.users)
SLOTS
--
all
queue_name
--
--
UJOB_LIMIT
(lsb.queues)
SLOTS
--
all
--
--
--
MAX_JOBS
(lsb.users)
SLOTS
--
--
queue_name
--
all
HJOB_LIMIT
(lsb.queues)
SLOTS
--
--
--
host_name
--
MXJ (lsb.hosts)
SLOTS_PER_PROCESSOR
--
--
queue_name
--
all
PJOB_LIMIT
(lsb.queues)
SLOTS
--
--
queue_name
--
--
QJOB_LIMIT
(lsb.queues)

Limits for the following resources have no corresponding limit in lsb.users, lsb.hosts, and lsb.queues:

How conflicting limits are resolved

Similar conflicting limits

For similar limits configured in lsb.resources, lsb.users, lsb.hosts, or lsb.queues, the most restrictive limit is used. For example, a slot limit of 3 for all users is configured in lsb.resources:

Begin Limit
NAME  = user_limit1
USERS = all
SLOTS = 3
End Limit

This is similar, but not equivalent to an existing MAX_JOBS limit of 2 is configured in lsb.users.

% busers
USER/GROUP    JL/P    MAX  NJOBS   PEND    RUN  SSUSP  USUSP    
RSV 
user1           -       2      4      2      2      0      0      
0

user1 submits 4 jobs:

% bjobs
JOBID   USER    STAT  QUEUE     FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
816     user1   RUN   normal    hostA       hostA       sleep 1000 Jan 22 16:34
817     user1   RUN   normal    hostA       hostA       sleep 1000 Jan 22 16:34
818     user1   PEND  normal    hostA                   sleep 1000 Jan 22 16:34
819     user1   PEND  normal    hostA                   sleep 1000 Jan 22 16:34

Two jobs (818 and 819) remain pending because the more restrictive limit of 2 from lsb.users is enforced:

% bjobs -p
JOBID   USER    STAT  QUEUE      FROM_HOST      JOB_NAME           SUBMIT_TIME
818     user1   PEND  normal     hostA          sleep 1000         Jan 22 16:34
The user has reached his/her job slot limit;
819     user1   PEND  normal     hostA          sleep 1000         Jan 22 16:34
The user has reached his/her job slot limit;

If the MAX_JOBS limit in lsb.users is 4:

% busers
USER/GROUP  JL/P   MAX  NJOBS   PEND   RUN  SSUSP  USUSP  RSV
user1         -      4      4      1     3      0      0    0

and user1 submits 4 jobs:

% bjobs
JOBID  USER    STAT  QUEUE   FROM_HOST   EXEC_HOST    JOB_NAME     SUBMIT_TIME
824    user1   RUN   normal  hostA       hostA        sleep 1000   Jan 22 16:38
825    user1   RUN   normal  hostA       hostA        sleep 1000   Jan 22 16:38
826    user1   RUN   normal  hostA       hostA        sleep 1000   Jan 22 16:38
827    user1   PEND  normal  hostA                    sleep 1000   Jan 22 16:38

Only one job (827) remains pending because the more restrictive limit of 3 in lsb.resources is enforced:

% bjobs -p
JOBID    USER    STAT  QUEUE   FROM_HOST       JOB_NAME           SUBMIT_TIME
827     user1    PEND  normal      hostA     sleep 1000          Jan 22 16:38
Resource (slot) limit defined cluster-wide has been reached;

Equivalent conflicting limits

New limits in lsb.resources that are equivalent to existing limits in lsb.users, lsb.hosts, or lsb.queues, but with a different value override the existing limits. The equivalent limits in lsb.users, lsb.hosts, or lsb.queues are ignored, and the value of the new limit in lsb.resources is used.

For example, a per-user job slot limit in lsb.resources is equivalent to a MAX_JOBS limit in lsb.users, so only the lsb.resources limit is enforced, the limit in lsb.users is ignored:

Begin Limit
NAME  = slot_limit
PER_USER =all
SLOTS = 3
End Limit

Example limit configurations

Each set of limits is defined in a Limit section enclosed by Begin Limit and End Limit.

Example 1

user1 is limited to 2 job slots on hostA, and user2's jobs on queue normal are limited to 20 MB of memory:

Begin Limit
HOSTS     SLOTS  MEM   SWP  TMP   USERS       QUEUES
hostA     2      -      -    -    user1       -
-         -      20     -    -    user2       normal
End Limit

Example 2

Set a job slot limit of 2 for user user1 submitting jobs to queue normal on host hosta for all projects, but only one job slot for all queues and hosts for project test:

Begin Limit
HOSTS  SLOTS  PROJECTS   USERS     QUEUES
hosta  2         -       user1     normal
  -    1      test       user1       -   
End Limit

Example 3

Limit usage of hosts in license1 group:

Begin Limit
HOSTS       SLOTS   MEM    PER_QUEUE
license1    10      -      normal
license1    -       200    short
license1    30      300    (all ~normal ~short)
End Limit

Example 4

All users in user group ugroup1 except user1 using queue1 and queue2 and running jobs on hosts in host group hgroup1 are limited to 2 job slots per processor on each host:

Begin Limit
NAME          = limit1
# Resources:
SLOTS_PER_PROCESSOR = 2
#Consumers:
QUEUES       = queue1 queue2
USERS        = ugroup1 ~user1
PER_HOST     = hgroup1
End Limit

Example 5

user1 and user2 can use all queues and all hosts in the cluster with a limit of 20 MB of available memory:

Begin Limit
NAME  = 20_MB_mem 
# Resources:
MEM   = 20
# Consumers:
USERS = user1 user2
End Limit

Example 6

All users in user group ugroup1 can use queue1 and queue2 and run jobs on any host in host group hgroup1 sharing 10 job slots:

Begin Limit
NAME   = 10_slot 
# Resources:
SLOTS  = 10
#Consumers:
QUEUES = queue1 queue2
USERS  = ugroup1
HOSTS  = hgroup1
End Limit

Example 7

All users in user group ugroup1 except user1 can use all queues but queue1 and run jobs with a limit of 10% of available memory on each host in host group hgroup1:

Begin Limit
NAME     = 10_percent_mem
# Resources:
MEM      = 10%
QUEUES   = all ~queue1
USERS    = ugroup1 ~user1
PER_HOST = hgroup1
End Limit

Example 8

Limit users in the develop group to 1 job on each host, and 50% of the memory on the host.

Begin Limit
NAME = develop_group_limit
# Resources:
SLOTS = 1
MEM = 50%
#Consumers:
USERS = develop
PER_HOST = all
End Limit

Example 9

Limit software license lic1, with quantity 100, where user1 can use 90 licenses and all other users are restricted to 10.

Begin Limit
USERS          LICENSE
user1          ([lic1,90])
(all ~user1)   ([lic1,10])
End Limit

lic1 is defined as a decreasing numeric shared resource in lsf.shared.

To submit a job to use one lic1 license, use the rusage string in the -R option of bsub specify the license:

% bsub -R "rusage[lic1=1]" my-job

Example 10

Jobs from crash project can use 10 lic1 licenses, while jobs from all other projects together can use 5.

Begin Limit
LICENSE        PROJECTS
([lic1,10])    crash
([lic1,5])     (all ~crash)
End Limit

lic1 is defined as a decreasing numeric shared resource in lsf.shared.

Example 11

Limit host to 1 job slot per processor:

Begin Limit
NAME                = default_limit
SLOTS_PER_PROCESSOR = 1
PER_HOST            = all
End Limit

[ Top ]


Viewing Information about Resource Allocation Limits

Your job may be pending because some configured resource allocation limit has been reached. Use the blimits command to show the dynamic counters of resource allocation limits configured in Limit sections in lsb.resources. blimits displays the current resource usage to show what limits may be blocking your job.

blimits command

The blimits command displays:

Resources that have no configured limits or no limit usage are indicated by a dash (-). Limits are displayed in a USED/LIMIT format. For example, if a limit of 10 slots is configured and 3 slots are in use, then blimits displays the limit for SLOTS as 3/10.

If limits MEM, SWP, or TMP are configured as percentages, both the limit and the amount used are displayed in MB. For example, lshosts displays maxmem of 249 MB, and MEM is limited to 10% of available memory. If 10 MB out of are used, blimits displays the limit for MEM as 10/25 (10 MB USED from a 25 MB LIMIT).

Configured limits and resource usage for builtin resources (slots, mem, tmp, and swp load indices) are displayed as INTERNAL RESOURCE LIMITS separately from custom external resources, which are shown as EXTERNAL RESOURCE LIMITS.

Limits are displayed for both the vertical tabular format and the horizontal format for Limit sections. Since a vertical format Limit section has no name, blimits displays NONAMEnnn under the NAME column for these limits, where the unnamed limits are numbered in the order the vertical-format Limit sections appear in the lsb.resources file.

If a resource consumer is configured as all, the limit usage for that consumer is indicated by a dash (-).

PER_HOST slot limits are not displayed. The bhosts commands displays these as MXJ limits.

In MultiCluster, blimits returns the information about all limits in the local cluster.

Examples

For the following limit definitions:

Begin Limit
NAME = limit1
USERS = user1
PER_QUEUE = all
PER_HOST = hostA hostC
TMP = 30%
SWP = 50%
MEM = 10%
End Limit

Begin Limit
NAME = limit_ext1
PER_HOST = all
RESOURCE = ([user1_num,30] [hc_num,20])
End Limit

blimits displays the following:

% blimits
 
INTERNAL RESOURCE LIMITS:

NAME     USERS     QUEUES     HOSTS   PROJECTS   SLOTS    MEM      TMP      SWP
limit1   user1         q2     hostA         -       -   10/25        -   10/258
limit1   user1         q3     hostA         -       -       -   30/2953       -
limit1   user1         q4     hostC         -       -       -    40/590       -

EXTERNAL RESOURCE LIMITS:

NAME        USERS   QUEUES   HOSTS   PROJECTS    user1_num    hc_num     HC_num
limit_ext1      -        -   hostA          -           -       1/20          -
limit_ext1      -        -   hostC          -         1/30      1/20          -


[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 12, 2004
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.