[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- About Resource Usage Limits
- Specifying Resource Usage Limits
- Supported Resource Usage Limits and Syntax
- CPU Time and Run Time Normalization
[ Top ]
About Resource Usage Limits
Resource usage limits control how much resource can be consumed by running jobs. Jobs that use more than the specified amount of a resource are signalled or have their priority lowered.
Limits can be specified either at the queue level by your LSF administrator (
lsb.queues) or at the job level when you submit a job.For example, by defining a high-priority short queue, you can allow short jobs to be scheduled earlier than long jobs. To prevent some users from submitting long jobs to this short queue, you can set CPU limit for the queue so that no jobs submitted from the queue can run for longer than that limit.
Limits specified at the queue level are hard limits, while those specified with job submission are soft limits. See
setrlimit(2)man page for concepts of hard and soft limits.Resource usage limits and resource allocation limits
Resource usage limits are not the same as resource allocation limits, which are enforced during job scheduling and before jobs are dispatched. You set resource allocation limits to restrict the amount of a given resource that must be available during job scheduling for different classes of jobs to start, and which resource consumers the limits apply to.
See Resource Allocation Limits for more information.
Summary of resource usage limits
Priority of resource usage limits
If no limit is specified at job submission, then the following apply to all jobs submitted to the queue:
Incorrect resource usage limits
Incorrect limits are ignored, and a warning message is displayed when the cluster is reconfigured or restarted. A warning message is also logged to the
mbatchdlog file when LSF is started.If no limit is specified at job submission, then the following apply to all jobs submitted to the queue:
Resource usage limits specified at job submission must be less than the maximum specified in
lsb.queues. The job submission is rejected if the user- specified limit is greater than the queue-level maximum, and the following message is issued:Cannot exceed queue's hard limit(s). Job not submitted.Enforcing limits on chunk jobs
By default, resource usage limits are not enforced for chunk jobs because chunk jobs are typically too short to allow LSF to collect resource usage.
To enforce resource limits for chunk jobs, define LSB_CHUNK_RUSAGE=Y in
lsf.conf. Limits may not be enforced for chunk jobs that take less than a minute to run.[ Top ]
Specifying Resource Usage Limits
Queues can enforce resource usage limits on running jobs. LSF supports most of the limits that the underlying operating system supports. In addition, LSF also supports a few limits that the underlying operating system does not support.
Specify queue-level resource usage limits using parameters in
lsb.queues.Specifying queue-level resource usage limits
Limits configured in
lsb.queuesapply to all jobs submitted to the queue. Job- level resource usage limits specified at job submission override the queue definitions.Specify only a maximum value for the resource.
For example, to specify a maximum run limit, use one value for the RUNLIMIT parameter in
lsb.queues:RUNLIMIT = 10The maximum run limit for the queue is 10 minutes. Jobs cannot run for more than 10 minutes. Jobs in the RUN state for longer than 10 minutes are killed by LSF.
If only one run limit is specified, jobs that are submitted with
bsub -Wwith a run limit that exceeds the maximum run limit will not be allowed to run. Jobs submitted withoutbsub -Wwill be allowed to run but will be killed when they are in the RUN state for longer than the specified maximum run limit.For example, in
lsb.queues:RUNLIMIT = 10The maximum run limit for the queue is 10 minutes. Jobs cannot run for more than 10 minutes.
If you specify two limits, the first one is the default (soft) limit for jobs in the queue and the second one is the maximum (hard) limit. Both the default and the maximum limits must be positive integers. The default limit must be less than the maximum limit. The default limit is ignored if it is greater than the maximum limit.
Use the default limit to avoid having to specify resource usage limits in the
bsubcommand.For example, to specify a default and a maximum run limit, use two values for the RUNLIMIT parameter in
lsb.queues:RUNLIMIT = 10 15
- The first number is the default run limit applied to all jobs in the queue that are submitted without a job-specific run limit (without
bsub -W).- The second number is the maximum run limit applied to all jobs in the queue that are submitted with a job-specific run limit (with
bsub -W). The default run limit must be less than the maximum run limit.You can specify both default and maximum values for the following resource usage limits in
lsb.queues:If default and maximum limits are specified for CPU time limits or run time limits, only one host specification is permitted. For example, the following CPU limits are correct (and have an identical effect):
The following CPU limit is incorrect:
CPULIMIT = 400/hostA 600/hostBThe following run limits are correct (and have an identical effect):
The following run limit is incorrect:
RUNLIMIT = 10/hostA 15/hostBDefault run limits for backfill scheduling
Default run limits are used for backfill scheduling of parallel jobs.
For example, in
lsb.queues, you enter: RUNLIMIT = 10 15
- The first number is the default run limit applied to all jobs in the queue that are submitted without a job-specific run limit (without
bsub -W).- The second number is the maximum run limit applied to all jobs in the queue that are submitted with a job-specific run limit (with
bsub -W). The default run limit cannot exceed the maximum run limit.Automatically assigning a default run limit to all jobs in the queue means that backfill scheduling works efficiently.
For example, in
lsb.queues, you enter:RUNLIMIT = 10 15The first number is the default run limit applied to all jobs in the queue that are submitted without a job-specific run limit. The second number is the maximum run limit.
If you submit a job to the queue without the -
Woption, the default run limit is used:%bsub myjobThe job
myjobcannot run for more than 10 minutes as specified with the default run limit.If you submit a job to the queue with the
-Woption, the maximum run limit is used:%bsub -W 12 myjobThe job
myjobis allowed to run on the queue because the specified run limit (12) is less than the maximum run limit for the queue (15).%bsub -W 20 myjobThe job
myjobis rejected from the queue because the specified run limit (20) is more than the maximum run limit for the queue (15).Specifying job-level resource usage limits
To specify resource usage limits at the job level, use one of the following
bsuboptions:
-Ccore_limit-ccpu_limit-Ddata_limit-Ffile_limit-Mmem_limit-pprocess_limit-Wrun_limit-Sstack_limit-Tthread_limit-vswap_limitJob-level resource usage limits specified at job submission override the queue definitions.
[ Top ]
Supported Resource Usage Limits and Syntax
Core file size limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -Ccore_limit
CORELIMIT=limitinteger KB
Sets a per-process (soft) core file size limit in KB for each process that belongs to this batch job. On some systems, no core file is produced if the image for the process is larger than the core limit. On other systems only the first
core_limitKB of the image are dumped. The default is no soft limit.CPU time limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -c cpu_limit
CPULIMIT=[default] maximum[hours :]minutes[/host_name |/host_model]
Sets the soft CPU time limit to cpu_limit for this batch job. The default is no limit. This option is useful for avoiding runaway jobs that use up too many resources. LSF keeps track of the CPU time used by all processes of the job.
When the job accumulates the specified amount of CPU time, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.
You can define whether the CPU limit is a per-process limit enforced by the OS or a per-job limit enforced by LSF with LSB_JOB_CPULIMIT in
lsf.conf.Jobs submitted to a chunk job queue are not chunked if the CPU limit is greater than 30 minutes.
cpu_limit is in the form [hour
:]minute, where minute can be greater than 59. 3.5 hours can either be specified as 3:30 or 210.The CPU time limit is normalized according to the CPU factor of the submission host and execution host. The CPU limit is scaled so that the job does approximately the same amount of processing for a given CPU limit, even if it is sent to a host with a faster or slower CPU.
For example, if a job is submitted from a host with a CPU factor of 2 and executed on a host with a CPU factor of 3, the CPU time limit is multiplied by 2/3 because the execution host can do the same amount of work as the submission host in 2/3 of the time.
If the optional host name or host model is not given, the CPU limit is scaled based on the DEFAULT_HOST_SPEC specified in the
lsb.paramsfile. (If DEFAULT_HOST_SPEC is not defined, the fastest batch host in the cluster is used as the default.) If host or host model is given, its CPU scaling factor is used to adjust the actual CPU time limit at the execution host.The following example specifies that
myjobcan run for 10 minutes on a DEC3000 host, or the corresponding time on any other host:% bsub -c 10/DEC3000 myjobSee CPU Time and Run Time Normalization for more information.
Data segment size limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -D data_limit
DATALIMIT=[default] maximuminteger KB
Sets a per-process (soft) data segment size limit in KB for each process that belongs to this batch job. An
sbrk()ormalloc()call to extend the data segment beyond the data limit returns an error. The default is no soft limit.File size limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -F file_limit
FILELIMIT=limitinteger KB
Sets a per-process (soft) file size limit in KB for each process that belongs to this batch job. If a process of this job attempts to write to a file such that the file size would increase beyond the file limit, the kernel sends that process a SIGXFSZ signal. This condition normally terminates the process, but may be caught. The default is no soft limit.
Memory limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -M mem_limit
MEMLIMIT=[default] maximuminteger KB
Sets the memory limit, in KB.
If LSB_MEMLIMIT_ENFORCE or LSB_JOB_MEMLIMIT in
lsf.confare set toy, LSF kills the job when it exceeds the memory limit. Otherwise, LSF passes the memory limit to the operating system. Some operating systems apply the memory limit to each process, and some do not enforce the memory limit at all.To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in
lsf.conftoy. LSF memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past mem_limit.You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT in
lsf.conftoy.The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.LSB_JOB_MEMLIMIT disables per-process memory limit enforced by the OS and enables per-job memory limit enforced by LSF. When the total memory allocated to all processes in the job exceeds the memory limit, LSF sends the following signals to kill the job: SIGINT first, then SIGTERM, then SIGKILL.
On UNIX, the time interval between SIGINT, SIGKILL, SIGTERM can be configured with the parameter JOB_TERMINATE_INTERVAL in
lsb.params.OS enforcement usually allows the process to eventually run to completion. LSF passes mem_limit to the OS which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared mem_limit.
OS memory limit enforcement is only available on systems that support
RLIMIT_RSSforsetrlimit().The following operating systems do not support the memory limit at the OS level:
Process limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -pprocess_limit
PROCESSLIMIT=[default] maximuminteger
Sets the limit of the number of processes to process_limit for the whole job. The default is no limit. Exceeding the limit causes the job to terminate.
Limits the number of concurrent processes that can be part of a job.
If a default process limit is specified, jobs submitted to the queue without a job-level process limit are killed when the default process limit is reached.
If you specify only one limit, it is the maximum, or hard, process limit. If you specify two limits, the first one is the default, or soft, process limit, and the second one is the maximum process limit.
Run time limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -W run_limit
RUNLIMIT=[default] maximum[hours :]minutes[
/host_name |
/host_model]
A run time limit is the maximum amount of time a job can run before it is terminated. It sets the run time limit of a job. The default is no limit. If the accumulated time the job has spent in the RUN state exceeds this limit, the job is sent a USR2 signal. If the job does not terminate within 10 minutes after being sent this signal, it is killed.
With deadline constraint scheduling configured, a run limit also specifies the amount of time a job is expected to take, and the minimum amount of time that must be available before a job can be started.
Jobs submitted to a chunk job queue are not chunked if the run limit is greater than 30 minutes.
run_limit is in the form [hour
:]minute, where minute can be greater than 59. 3.5 hours can either be specified as 3:30 or 210.The run time limit is normalized according to the CPU factor of the submission host and execution host. The run limit is scaled so that the job has approximately the same run time for a given run limit, even if it is sent to a host with a faster or slower CPU.
For example, if a job is submitted from a host with a CPU factor of 2 and executed on a host with a CPU factor of 3, the run limit is multiplied by 2/3 because the execution host can do the same amount of work as the submission host in 2/3 of the time.
If the optional host name or host model is not given, the run limit is scaled based on the DEFAULT_HOST_SPEC specified in the
lsb.paramsfile. (If DEFAULT_HOST_SPEC is not defined, the fastest batch host in the cluster is used as the default.) If host or host model is given, its CPU scaling factor is used to adjust the actual run limit at the execution host.The following example specifies that
myjobcan run for 10 minutes on a DEC3000 host, or the corresponding time on any other host:% bsub -W 10/DEC3000 myjobIf ABS_RUNLIMIT=Y is defined in
lsb.params, the run time limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a run limit.See CPU Time and Run Time Normalization for more information.
For MultiCluster jobs, if no other CPU time normalization host is defined and information about the submission host is not available, LSF uses the host with the largest CPU factor (the fastest host in the cluster). The ABS_RUNLIMIT parameter in
lsb.paramsis is not supported in either MultiCluster model; run time limit is normalized by the CPU factor of the execution host.Thread limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -Tthread_limit
THREADLIMIT=[default] maximuminteger
Sets the limit of the number of concurrent threads to thread_limit for the whole job. The default is no limit.
Exceeding the limit causes the job to terminate. The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL.
If a default thread limit is specified, jobs submitted to the queue without a job- level thread limit are killed when the default thread limit is reached.
If you specify only one limit, it is the maximum, or hard, thread limit. If you specify two limits, the first one is the default, or soft, thread limit, and the second one is the maximum thread limit.
Stack segment size limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -S stack_limit
STACKLIMIT=limitinteger KB
Sets a per-process (soft) stack segment size limit in KB for each process that belongs to this batch job. An
sbrk()call to extend the stack segment beyond the stack limit causes the process to be terminated. The default is no soft limit.Virtual memory (swap) limit
Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -vswap_limit
SWAPLIMIT=limitinteger KB
Sets the total process virtual memory limit to swap_limit in KB for the whole job. The default is no limit. Exceeding the limit causes the job to terminate.
This limit applies to the whole job, no matter how many processes the job may contain.
Examples
CPULIMIT = 20/hostA 15The first number is the default CPU limit. The second number is the maximum CPU limit.
However, the default CPU limit is ignored because it is a higher value than the maximum CPU limit.
CPULIMIT = 10/hostAIn this example, the lack of a second number specifies that there is no default CPU limit. The specified number is considered as the default and maximum CPU limit.
RUNLIMIT = 10/hostA 15The first number is the default run limit. The second number is the maximum run limit.
The first number specifies that the default run limit is to be used for jobs that are submitted without a specified run limit (without the
-Woption ofbsub).RUNLIMIT = 10/hostANo default run limit is specified. The specified number is considered as the default and maximum run limit.
THREADLIMIT=6No default thread limit is specified. The value 6 is the default and maximum thread limit.
THREADLIMIT=6 8The first value (6) is the default thread limit. The second value (8) is the maximum thread limit.
%bsub -M 5000 myjobSubmits
myjobwith a memory limit of 5000 KB.%bsub -W 14 myjob
myjobis expected to run for 14 minutes. If the run limit specified withbsub -Wexceeds the value for the queue, the job will be rejected.%bsub -T 4 myjobSubmits
myjobwith a maximum number of concurrent threads of 4.
[ Top ]
CPU Time and Run Time Normalization
To set the CPU time limit and run time limit for jobs in a platform-independent way, LSF scales the limits by the CPU factor of the hosts involved. When a job is dispatched to a host for execution, the limits are then normalized according to the CPU factor of the execution host.
Whenever a normalized CPU time or run time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.
If ABS_RUNLIMIT=Y is defined in
lsb.params, the run time limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a run limit.Normalization host
If no host or host model is given with the CPU time or run time, LSF uses the default CPU time normalization host defined at the queue level (DEFAULT_HOST_SPEC in
lsb.queues) if it has been configured, otherwise uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC inlsb.params) if it has been configured, otherwise uses the submission host.CPULIMIT=10/hostAIf
hostAhas a CPU factor of 2, andhostBhas a CPU factor of 1 (hostBis slower thanhostA), this specifies an actual time limit of 10 minutes onhostA, or on any other host that has a CPU factor of 2. However, ifhostBis the execution host, the actual time limit onhostBis 20 minutes (10 * 2 / 1).Normalization hosts for default CPU and run time limits
The first valid CPU factor encountered is used for both CPU limit and run time limit. To be valid, a host specification must be a valid host name that is a member of the LSF cluster. The CPU factor is used even if the specified limit is not valid.
If the CPU and run limit have different host specifications, the CPU limit host specification is enforced.
If no host or host model is given with the CPU or run time limits, LSF determines the default normalization host according to the following priority:
- DEFAULT_HOST_SPEC is configured in
lsb.queues- DEFAULT_HOST_SPEC is configured in
lsb.params- If DEFAULT_HOST_SPEC is not configured in
lsb.queuesorlsb.params, host with the largest CPU factor is used.CPU time display (bacct, bhist, bqueues)
Normalized CPU time is displayed in the output of
bqueues. CPU time is not normalized in the output ifbacctandbhist.[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: January 12, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.