Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Welcome


Contents

[ Top ]


About This Guide

Purpose of this guide

This guide describes how to manage and configure Platform LSF® software ("LSF"). In it, you will find information to do the following:

Who should use this guide

This guide is intended for Platform LSF cluster administrators who need to implement business policies in LSF. Users who want more in-depth understanding of advanced details of LSF operation should also read this guide. Users who simply want to run and monitor their jobs should read Running Jobs with Platform LSF.

What you should already know

This guide assumes:

Typographical conventions

Typeface Meaning Example
Courier
The names of on-screen computer output, commands, files, and directories
The lsid command
Bold Courier
What you type, exactly as shown
Type cd /bin
Italics
  • Book titles, new words or terms, or words to be emphasized
  • Command-line place holders--replace with a real name or value
The queue specified by queue_name
Bold Sans Serif
  • Names of GUI elements that you manipulate
Click OK

Command notation

Notation Meaning Example
Quotes " or '
Must be entered exactly as shown
"job_ID[index_list]"
Commas ,
Must be entered exactly as shown
-C time0,time1
Ellipsis ...
The argument before the ellipsis can be repeated. Do not enter the ellipsis.
job_ID ...
lower case italics
The argument must be replaced with a real value you provide.
job_ID
OR bar |
You must enter one of the items separated by the bar. You cannot enter more than one item, Do not enter the bar.
[-h | -V]
Parenthesis (  )
Must be entered exactly as shown
-X "exception_cond([params])::acti on] ... 
Option or variable in square brackets [  ]
The argument within the brackets is optional. Do not enter the brackets.
lsid [-h]
Shell prompts
  • C shell: %
  • Bourne shell and Korn shell: $
  • root account: #
Unless otherwise noted, the C shell prompt is used in all command examples
% cd /bin

[ Top ]


What's New in the Platform LSF Version 6.0

Platform LSF Version 6.0 introduces the following new features:

Policy management

Goal-oriented SLA-driven scheduling

Goal-oriented SLA-driven scheduling policies help you configure your workload so that your jobs are completed on time and reduce the risk of missed deadlines:

You implement your SLA scheduling policies in service classes associated with your projects and users. Each service class defines how many jobs should be run to meet different kinds of goals:

You use the bsla command to track the progress of your projects and see whether they are meeting the goals of your policy.

See Goal-Oriented SLA-Driven Scheduling for more information.

Platform LSF License Scheduler

Platform LSF License Scheduler ensures that higher priority work never has to wait for a license. Prioritized sharing of application licenses allows you to make policies that control the way software licenses are shared among different users in your organization.

You configure your software license distribution policy and LSF intelligently allocates licenses to improve quality of service to your end users while increasing throughput of high-priority work and reducing license costs.

It has the following features:

See Using Platform LSF License Scheduler for installation and configuration instructions.

Platform LSF license-aware scheduling is available as separately installable add-on packages located in /license_scheduler/ on the Platform FTP site (ftp.platform.com/).

Job-level exception management

Configure hosts and queues so that LSF takes appropriate action automatically when it detects exceptional conditions while jobs are running. Customize what exceptions are detected, and their corresponding actions.

LSF detects:

See Working with Hosts for more information.

Queue-based fairshare

Prevents starvation of low-priority work and ensures high-priority jobs get the resources they require by sharing resources among queues. Queue-based fairshare extends your existing user- and project-based fairshare policies by enabling flexible slot allocation per queue based on slot share units you configure.

See Fairshare Scheduling for more information.

User fairshare by queue priority

Improves control of user-based fairshare by taking queue priority into account for dispatching jobs from different queues against the same user fairshare policy. Within the queue, dispatch order is based on share quota.

See Fairshare Scheduling for more information.

Job group support

Use LSF job groups to organize and control a collection of individual jobs in higher level work units for easy management. A job group is a container for jobs in much the same way that a directory in a file system is a container for files. For example, you can organize jobs around groups that are meaningful to your business: a payroll application may have one group of jobs that calculates weekly payments, another job group for calculating monthly salaries, and a third job group that handles the salaries of part-time or contract employees.

Jobs groups increase end-user productivity by reducing complexity:

See Managing Jobs for more information.

High Performance Computing

Dynamic ptile enforcement

Parallel jobs now have a flexible choice of the number of CPUs in the different kinds of hosts in a heterogeneous cluster.

Improves the performance and throughput of parallel jobs by setting multiple ptile values in a span string according to the CPU configuration of the host type or model.

You can specify various ptile values in the queue (RES_REQ in lsb.queues, or at job submission with bsub -R):

See Specifying Resource Requirements for more information.

Resource requirement specification for advance reservation

You no longer need to specify a host list manually for your advance reservations. Specify a resource requirement string with the -R option of brsvadd instead of or in addition to a list of hosts. This makes advance reservation specification more flexible by reserving host slots based on your specific resource requirements. Only hosts that satisfy the resource requirement expression are reserved.

See Advance Reservation for more information.

Administration and diagnosis

Scheduler dynamic debug

Enables dynamic debugging of the LSF scheduler daemon (mbschd) without reconfiguring the cluster. Administrators no longer need to run badmin mbdrestart to debug the LSF scheduler:

badmin schddebug [-c class_name] [-l debug_level] [-f 
logfile_name] [-o]
badmin schdtime [-l timing_level] [-f logfile_name] [-o]

See Troubleshooting and Error Messages for more information.

Administrator action messages

Improves communication of LSF status to users. Users know the reason for the administrator actions, and administrators can easily communicate actions to users.

Administrators can attach a message to mbatchd restart, and host and queue operations:

To see administrator comments, users run badmin hist, badmin mbdhist, badmin hhist, or badmin qhist.

See Working with Your Cluster, Working with Hosts, and Working with Queues for more information.

Platform LSF Reports

Understand cluster operations better, so that you can improve performance and troubleshoot configuration problems.

Platform LSF Reports provides a lightweight reporting package for single LSF clusters. It provides simple two-week reporting for smaller LSF clusters (about 100 hosts, 1,000 jobs/day) and shows trends for basic cluster metrics by user, project, host, resource and queue.

LSF Reports provides the following historical information about a cluster:

See Platform LSF Reports Reference for installation and configuration instructions.

Platform LSF Reports is available as separately installable add-on packages located in /lsf_reports/ on the Platform FTP site (ftp.platform.com/).

Run-time enhancements

Thread limit enforcement

Control job thread limit like other limits. Use bsub -T to set the limit of the number of concurrent threads for the whole job. The default is no limit. In the queue, set THREADLIMIT to limit the number of concurrent threads that can be part of a job. Exceeding the limit causes the job to terminate.

See Runtime Resource Usage Limits for more information.

Non-normalized job run time limit

Presents consistent job run time limits no matter which host runs the job. With non-normalized job run limit configured, job run time is not normalized by CPU factor.

If ABS_RUNLIMIT=Y is defined in lsb.params, the run time limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a run limit.

See Runtime Resource Usage Limits for more information.

Resource allocation limit display (blimits command)

Improves visibility to resource allocation limits. If your job is pending because some configured resource allocation limit has been reached, you can find out what limits may be blocking your job.

Use the blimits command to show the dynamic counters of each resource allocation limit configured in lsb.resources.

See Resource Allocation Limits for more information.

[ Top ]


Upgrade and Compatibility Notes

UPGRADE document

To upgrade to LSF Version 6.0, follow the steps in upgrade.html.

API Compatibility between LSF 5.x and Version 6.0

Full backward compatibility: your applications will run under LSF Version 6.0 without changing any code.

The Platform LSF Version 6.0 API is fully compatible with the LSF Version 5.x and Version 4.x API. An application linked with the LSF Version 5.x and Version 4.x library will run under LSF Version 6.0 without relinking.

To take full advantage of new Platform LSF Version 6.0 features, you should recompile your existing LSF applications with LSF Version 6.0.

Server host compatibility Platform LSF

You must upgrade the LSF master hosts in your cluster to Version 6.0.

LSF 5.x servers are compatible with Version 6.0 master hosts. All LSF 5.x features are supported by 6.0 master hosts except:

To use new features introduced in Platform LSF Version 6.0, you must upgrade all hosts in your cluster to 6.0.

Platform LSF MultiCluster

You must upgrade the LSF master hosts in all clusters to Version 6.0.

New configuration parameters and environment variables

The following new parameters and environment variables have been added for LSF Version 6.0:

lsb.hosts

EXIT_RATE specifies a threshold in minutes for exited jobs

lsb.params

lsb.queues

Environment variables

New command options and output

The following command options and output have changed for LSF Version 6.0:

bacct

badmin

bhist

-l displays:

bhosts

bjobs

bkill

bmod

bqueues

-l displays:

bresume

-g job_group_name resumes only jobs in the specified job group

brsvadd

-R selects hosts for the reservation according to the specified resource requirements

bstop

bsub

New files added to installation

The following new files have been added to the Platform LSF Version 6.0 installation:

Symbolic links to LSF files


If your installation uses symbolic links to other files in these directories, you must manually create links to these new files.

New accounting and job event fields

The following fields have been added to lsb.acct and lsb.events:

lsb.acct

lsb.events

[ Top ]


Learning About Platform Products

World Wide Web and FTP

The latest information about all supported releases of Platform LSF is available on the Platform Web site at www.platform.com. Look in the Online Support area for current README files, Release Notes, Upgrade Notices, Frequently Asked Questions (FAQs), Troubleshooting, and other helpful information.

The Platform FTP site (ftp.platform.com) also provides current README files, Release Notes, and Upgrade information for all supported releases of Platform LSF.

Visit the Platform User Forum at www.platformusers.net to discuss workload management and strategies pertaining to distributed and Grid Computing.

If you have problems accessing the Platform web site or the Platform FTP site, contact support@platform.com.

Platform training

Platform's Professional Services training courses can help you gain the skills necessary to effectively install, configure and manage your Platform products. Courses are available for both new and experienced users and administrators at our corporate headquarters and Platform locations worldwide.

Customized on-site course delivery is also available.

Find out more about Platform Training at www.platform.com/training, or contact Training@platform.com for details.

README files and release notes and UPGRADE

Before installing LSF, be sure to read the files named readme.html and release_notes.html. To upgrade to Version 6.0, follow the steps in upgrade.html.

You can also view these files from the Download area of the Platform Online Support Web page.

Platform documentation

Documentation for Platform products is available in HTML and PDF format on the Platform Web site at www.platform.com/services/support/docs_home.asp.

[ Top ]


Technical Support

Contact Platform Computing or your LSF vendor for technical support.

Email

support@platform.com

World Wide Web

www.platform.com

Phone

Toll-free phoneó

1-877-444-4LSF (+1 877 444 4573)

Mail

Platform Support
Platform Computing Corporation
3760 14th Avenue
Markham, Ontario
Canada L3R 3T7

When contacting Platform, please include the full name of your company.

We'd like to hear from you

If you find an error in any Platform documentation, or you have a suggestion for improving it, please let us know:

Email

doc@platform.com

Mail

Information Development
Platform Computing Corporation
3760 14th Avenue
Markham, Ontario
Canada L3R 3T7

Be sure to tell us:

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: January 12, 2004
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.