[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
[ Top ]
About Resource Allocation Limits
- What resource allocation limits do
- How LSF enforces limits
- How LSF counts resources
- Limits for resource consumers
What resource allocation limits do
By default, resource consumers like users, hosts, queues, or projects are not limited in the resources available to them for running jobs. Resource allocation limits configured in
lsb.resourcesrestrict:
- The maximum amount of a resource requested by a job that can be allocated during job scheduling for different classes of jobs to start
- Which resource consumers the limits apply to
If all of the resource has been consumed, no more jobs can be started until some of the resource is released.
For example, by limiting maximum amount of memory for each of your hosts, you can make sure that your system operates at optimal performance. By defining a memory limit for some users submitting jobs to a particular queue and a specified set of hosts, you can prevent these users from using up all the memory in the system at one time.
For limits to apply, the job must specify resource requirements (
bsub -Rrusage string or RES_REQ inlsb.queues). For example, the a memory allocation limit of 4 MB is configured inlsb.resources:Begin Limit NAME = mem_limit1 MEM = 4 End LimitA is job submitted with an rusage resource requirement that exceeds this limit:
% bsub -R"rusage[mem=5]" unameand remains pending:
% bjobs -p 600 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 600 user1 PEND normal suplin02 uname Aug 12 14:05 Resource (mem) limit defined cluster-wide has been reached;A job is submitted with a resource requirement within the configured limit:
% bsub -R"rusage[mem=3]" sleep 100is allowed to run:
% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 600 user1 PEND normal hostA uname Aug 12 14:05 604 user1 RUN normal hostA sleep 100 Aug 12 14:09Resource allocation limits and resource usage limits
Resource allocation limits are not the same as resource usage limits, which are enforced during job run time. For example, you set CPU limits, memory limits, and other limits that take effect after a job starts running. See Runtime Resource Usage Limits for more information.
How LSF enforces limits
Resource allocation limits are enforced so that they apply to:
- Several kinds of resources:
- Several kinds of resource consumers:
- All jobs in the cluster
- Combinations of consumers:
How LSF counts resources
Resources on a host are not available if they are taken by jobs that have been started, but have not yet finished. This means running and suspended jobs count against the limits for queues, users, hosts, projects, and processors that they are associated with.
Job slot limits often correspond to the maximum number of jobs that can run at any point in time. For example, a queue cannot start jobs if it has no job slots available, and jobs cannot run on hosts that have no available job slots.
When processor or memory reservation occurs, the reserved resources count against the limits for users, queues, hosts, projects, and processors. When backfilling of parallel jobs occurs, the backfill jobs do not count against any limits.
Limits apply only to the cluster where
lsb.resourcesis configured. If the cluster leases hosts from another cluster, limits are enforced on those hosts as if they were local hosts.Limits for resource consumers
If a limit is specified for a host group, the total amount of a resource used by all hosts in that group is counted. If a host is a member of more than one group, each job running on that host is counted against the limit for all groups to which the host belongs.
Jobs are normally queued on a first-come, first-served (FCFS) basis. It is possible for some users to abuse the system by submitting a large number of jobs; jobs from other users must wait until these jobs complete. Limiting resources by user prevents users from monopolizing all the resources.
Users can submit an unlimited number of jobs, but if they have reached their limit for any resource, the rest of their jobs stay pending, until some of their running jobs finish or resources become available.
If a limit is specified for a user group, the total amount of a resource used by all users in that group is counted. If a user is a member of more than one group, each of that user's jobs is counted against the limit for all groups to which that user belongs.
Use the keyword
allto configure limits that apply to each user or user group in a cluster. This is useful if you have a large cluster but only want to exclude a few users from the limit definition.Per-user limits are enforced on each user or individually to each user in the user group listed. If a user group contains a subgroup, the limit also applies to each member in the subgroup recursively.
Per-user limits that use the keywords
allapply to each user in a cluster. If user groups are configured, the limit applies to each member of the user group, not the group as a whole.[ Top ]
Configuring Resource Allocation Limits
- lsb.resources file
- Enabling resource allocation limits
- Configuring cluster-wide limits
- Compatibility with pre-version 6.0 job slot limits
- How resource allocation limits map to pre-version 6.0 job slot limits
- Example limit configurations
lsb.resources file
Configure all resource allocation limits in one or more
Limitsections in thelsb.resourcesfile. Limit sections set limits for how much of the specified resources must be available for different classes of jobs to start, and which resource consumers the limits apply to.
Enabling resource allocation limits
Resource allocation limits scheduling plugin
To enable resource allocation limits in your cluster, configure the resource allocation limits scheduling plugin
schmod_limitinlsb.modules.Configuring lsb.modules
Begin PluginModule SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES schmod_default () () schmod_limit () () End PluginModuleConfiguring cluster-wide limits
To configure limits that take effect for your entire cluster, configure limits in
lsb.resources, but do not specify any consumers.Compatibility with pre-version 6.0 job slot limits
The
Limitsection oflsb.resourcesdoes not support the keywords or format used inlsb.users,lsb.hosts, andlsb.queues. However, any existing job slot limit configuration in these files will continue to apply.How resource allocation limits map to pre-version 6.0 job slot limits
Job slot limits are the only type of limit you can configure in
lsb.users,lsb.hosts, andlsb.queues. You cannot configure limits for user groups, host groups, and projects inlsb.users,lsb.hosts, andlsb.queues. You should not configure any new resource allocation limits inlsb.users,lsb.hosts, andlsb.queues. Uselsb.resourcesto configure all new resource allocation limits, including job slot limits.
Limits for the following resources have no corresponding limit in
lsb.users,lsb.hosts, andlsb.queues:How conflicting limits are resolved
For similar limits configured in
lsb.resources,lsb.users,lsb.hosts, orlsb.queues, the most restrictive limit is used. For example, a slot limit of 3 for all users is configured inlsb.resources:Begin Limit NAME = user_limit1 USERS = all SLOTS = 3 End LimitThis is similar, but not equivalent to an existing MAX_JOBS limit of 2 is configured in
lsb.users.% busers USER/GROUP JL/P MAX NJOBS PEND RUN SSUSP USUSP RSV user1 - 2 4 2 2 0 0 0
user1submits 4 jobs:% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 816 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:34 817 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:34 818 user1 PEND normal hostA sleep 1000 Jan 22 16:34 819 user1 PEND normal hostA sleep 1000 Jan 22 16:34Two jobs (818 and 819) remain pending because the more restrictive limit of 2 from
lsb.usersis enforced:% bjobs -p JOBID USER STAT QUEUE FROM_HOST JOB_NAME SUBMIT_TIME 818 user1 PEND normal hostA sleep 1000 Jan 22 16:34 The user has reached his/her job slot limit; 819 user1 PEND normal hostA sleep 1000 Jan 22 16:34 The user has reached his/her job slot limit;If the MAX_JOBS limit in
lsb.usersis 4:% busers USER/GROUP JL/P MAX NJOBS PEND RUN SSUSP USUSP RSV user1 - 4 4 1 3 0 0 0and
user1submits 4 jobs:% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 824 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:38 825 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:38 826 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:38 827 user1 PEND normal hostA sleep 1000 Jan 22 16:38Only one job (827) remains pending because the more restrictive limit of 3 in
lsb.resourcesis enforced:% bjobs -p JOBID USER STAT QUEUE FROM_HOST JOB_NAME SUBMIT_TIME 827 user1 PEND normal hostA sleep 1000 Jan 22 16:38 Resource (slot) limit defined cluster-wide has been reached;New limits in
lsb.resourcesthat are equivalent to existing limits inlsb.users,lsb.hosts, orlsb.queues, but with a different value override the existing limits. The equivalent limits inlsb.users,lsb.hosts, orlsb.queuesare ignored, and the value of the new limit inlsb.resourcesis used.For example, a per-user job slot limit in
lsb.resourcesis equivalent to a MAX_JOBS limit inlsb.users, so only thelsb.resourceslimit is enforced, the limit inlsb.usersis ignored:Begin Limit NAME = slot_limit PER_USER =all SLOTS = 3 End LimitExample limit configurations
Each set of limits is defined in a
Limitsection enclosed byBegin LimitandEnd Limit.
user1is limited to 2 job slots onhostA, anduser2's jobs on queuenormalare limited to 20 MB of memory:Begin Limit HOSTS SLOTS MEM SWP TMP USERS QUEUES hostA 2 - - - user1 - - - 20 - - user2 normal End LimitSet a job slot limit of 2 for user
user1submitting jobs to queuenormalon hosthostafor all projects, but only one job slot for all queues and hosts for projecttest:Begin Limit HOSTS SLOTS PROJECTS USERS QUEUES hosta 2 - user1 normal - 1 test user1 - End LimitLimit usage of hosts in
license1group:
- 10 jobs can run from
normalqueue- Any number can run from
shortqueue, but only can use 200 MB of memory in total- Each other queue can run 30 jobs, each queue using up to 300 MB of memory in total
Begin Limit HOSTS SLOTS MEM PER_QUEUE license1 10 - normal license1 - 200 short license1 30 300 (all ~normal ~short) End LimitAll users in user group
ugroup1exceptuser1usingqueue1andqueue2and running jobs on hosts in host grouphgroup1are limited to 2 job slots per processor on each host:Begin Limit NAME = limit1 # Resources: SLOTS_PER_PROCESSOR = 2 #Consumers: QUEUES = queue1 queue2 USERS = ugroup1 ~user1 PER_HOST = hgroup1 End Limit
user1anduser2can use all queues and all hosts in the cluster with a limit of 20 MB of available memory:Begin Limit NAME = 20_MB_mem # Resources: MEM = 20 # Consumers: USERS = user1 user2 End LimitAll users in user group
ugroup1can usequeue1andqueue2and run jobs on any host in host grouphgroup1sharing 10 job slots:Begin Limit NAME = 10_slot # Resources: SLOTS = 10 #Consumers: QUEUES = queue1 queue2 USERS = ugroup1 HOSTS = hgroup1 End LimitAll users in user group
ugroup1exceptuser1can use all queues butqueue1and run jobs with a limit of 10% of available memory on each host in host grouphgroup1:Begin Limit NAME = 10_percent_mem # Resources: MEM = 10% QUEUES = all ~queue1 USERS = ugroup1 ~user1 PER_HOST = hgroup1 End LimitLimit users in the
developgroup to 1 job on each host, and 50% of the memory on the host.Begin Limit NAME = develop_group_limit # Resources: SLOTS = 1 MEM = 50% #Consumers: USERS = develop PER_HOST = all End LimitLimit software license
lic1, with quantity 100, whereuser1can use 90 licenses and all other users are restricted to 10.Begin Limit USERS LICENSE user1 ([lic1,90]) (all ~user1) ([lic1,10]) End Limit
lic1is defined as a decreasing numeric shared resource inlsf.shared.To submit a job to use one
lic1license, use therusagestring in the-Roption ofbsubspecify the license:%bsub -R "rusage[lic1=1]" my-jobJobs from
crashproject can use 10lic1licenses, while jobs from all other projects together can use 5.Begin Limit LICENSE PROJECTS ([lic1,10]) crash ([lic1,5]) (all ~crash) End Limit
lic1is defined as a decreasing numeric shared resource inlsf.shared.Limit host to 1 job slot per processor:
Begin Limit NAME = default_limit SLOTS_PER_PROCESSOR = 1 PER_HOST = all End Limit[ Top ]
Viewing Information about Resource Allocation Limits
Your job may be pending because some configured resource allocation limit has been reached. Use the
blimitscommand to show the dynamic counters of resource allocation limits configured in Limit sections inlsb.resources.blimitsdisplays the current resource usage to show what limits may be blocking your job.blimits command
The
blimitscommand displays:
- Configured limit policy name
- Users (
-uoption)- Queues (
-qoption)- Hosts (
-moption)- Project names (
-poption)Resources that have no configured limits or no limit usage are indicated by a dash (
-). Limits are displayed in a USED/LIMIT format. For example, if a limit of 10 slots is configured and 3 slots are in use, thenblimitsdisplays the limit for SLOTS as 3/10.If limits MEM, SWP, or TMP are configured as percentages, both the limit and the amount used are displayed in MB. For example,
lshostsdisplays maxmem of 249 MB, and MEM is limited to 10% of available memory. If 10 MB out of are used,blimitsdisplays the limit for MEM as 10/25 (10 MB USED from a 25 MB LIMIT).Configured limits and resource usage for builtin resources (slots, mem, tmp, and swp load indices) are displayed as INTERNAL RESOURCE LIMITS separately from custom external resources, which are shown as EXTERNAL RESOURCE LIMITS.
Limits are displayed for both the vertical tabular format and the horizontal format for Limit sections. Since a vertical format Limit section has no name,
blimitsdisplays NONAMEnnn under the NAME column for these limits, where the unnamed limits are numbered in the order the vertical-format Limit sections appear in thelsb.resourcesfile.If a resource consumer is configured as
all, the limit usage for that consumer is indicated by a dash (-).PER_HOST slot limits are not displayed. The
bhostscommands displays these as MXJ limits.In MultiCluster,
blimitsreturns the information about all limits in the local cluster.Examples
For the following limit definitions:
Begin Limit NAME = limit1 USERS = user1 PER_QUEUE = all PER_HOST = hostA hostC TMP = 30% SWP = 50% MEM = 10% End Limit Begin Limit NAME = limit_ext1 PER_HOST = all RESOURCE = ([user1_num,30] [hc_num,20]) End Limit
blimitsdisplays the following:% blimits INTERNAL RESOURCE LIMITS: NAME USERS QUEUES HOSTS PROJECTS SLOTS MEM TMP SWP limit1 user1 q2 hostA - - 10/25 - 10/258 limit1 user1 q3 hostA - - - 30/2953 - limit1 user1 q4 hostC - - - 40/590 - EXTERNAL RESOURCE LIMITS: NAME USERS QUEUES HOSTS PROJECTS user1_num hc_num HC_num limit_ext1 - - hostA - - 1/20 - limit_ext1 - - hostC - 1/30 1/20 -
- In limit policy
limit1,user1submitting jobs toq2, q3, orq4onhostAorhostCis limited to 30% tmp space, 50% swap space, and 10% available memory. No limits have been reached, so the jobs fromuser1should run. For example, onhostAfor jobs fromq2, 10 MB of memory are used from a 25 MB limit and 10 MB of swap space are used from a 258 MB limit.- In limit policy
limit_ext1, external resourceuser1_numis limited to 30 per host and external resourcehc_numis limited to 20 per host. Again, no limits have been reached, so the jobs requesting those resources should run.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: January 12, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.