[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- Viewing Cluster Information
- Default Directory Structures
- Cluster Administrators
- Controlling Daemons
- Controlling mbatchd
- Reconfiguring Your Cluster
[ Top ]
Viewing Cluster Information
LSF provides commands for users to get information about the cluster. Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, and so on.
To view the ... Run ... Version of LSF
lsid
Cluster name
lsid
Current master host
lsid
Cluster administrators
lsclusters
Configuration parameters
bparams
Viewing LSF version, cluster name, and current master host
Use the
lsidcommand to display the version of LSF, the name of your cluster, and the current master host:%lsidPlatform LSF 6.0, Oct 31 2003 Copyright 1992-2004 Platform Computing Corporation My cluster name is cluster1 My master name is hostAViewing cluster administrators
Use the
lsclusterscommand to find out who your cluster administrator is and see a summary of your cluster:%lsclustersCLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS cluster1 ok hostA lsfadmin 6 6If you are using the LSF MultiCluster product, you will see one line for each of the clusters that your local cluster is connected to in the output of
lsclusters.Viewing configuration parameters
Use the
bparamscommand to display the generic configuration parameters of LSF. These include default queues, default host or host model for CPU speed scaling, job dispatch interval, job checking interval, job accepting interval, etc.%bparamsDefault Queues: normal idle Default Host Specification: DECAXP Job Dispatch Interval: 20 seconds Job Checking Interval: 15 seconds Job Accepting Interval: 20 secondsUse the
-loption ofbparamsto display the information in long format, which gives a brief description of each parameter as well as the name of the parameter as it appears inlsb.params.%bparams -lSystem default queues for automatic queue selection: DEFAULT_QUEUE = normal idle The interval for dispatching jobs by master batch daemon: MBD_SLEEP_TIME = 20 (seconds) The interval for checking jobs by slave batch daemon: SBD_SLEEP_TIME = 15 (seconds) The interval for a host to accept two batch jobs subsequently: JOB_ACCEPT_INTERVAL = 1 (* MBD_SLEEP_TIME) The idle time of a host for resuming pg suspended jobs: PG_SUSP_IT = 180 (seconds) The amount of time during which finished jobs are kept in core: CLEAN_PERIOD = 3600 (seconds) The maximum number of finished jobs that are logged in current event file: MAX_JOB_NUM = 2000 The maximum number of retries for reaching a slave batch daemon: MAX_SBD_FAIL = 3 The number of hours of resource consumption history: HIST_HOURS = 5 The default project assigned to jobs. DEFAULT_PROJECT = default[ Top ]
Default Directory Structures
UNIX
The following diagram shows a typical directory structure for a new UNIX installation. Depending on which products you have installed and platforms you have selected, your directory structure may vary.
![]()
Pre-4.2 UNIX installation directory structure
The following diagram shows a cluster installed with
lsfsetup. It uses the pre-4.2 directory structure.
![]()
Windows
The following diagram shows the directory structure for a default Windows installation.
![]()
[ Top ]
Cluster Administrators
Required. The first cluster administrator, specified during installation. The primary LSF administrator account owns the configuration and log files. The primary LSF administrator has permission to perform clusterwide operations, change configuration files, reconfigure the cluster, and control jobs submitted by all users.
Optional. May be configured during or after installation.
Cluster administrators can perform administrative operations on all jobs and queues in the cluster. Cluster administrators have the same cluster-wide operational privileges as the primary LSF administrator except that they do not have permission to change LSF configuration files.
Adding cluster administrators
- In the
ClusterAdminssection oflsf.cluster.cluster_name, specify the list of cluster administrators following ADMINISTRATORS, separated by spaces. The first administrator in the list is the primary LSF administrator. All others are cluster administrators. You can specify user names and group names. For example:Begin ClusterAdmins ADMINISTRATORS = lsfadmin admin1 admin2 End ClusterAdmins- Save your changes.
- Run
lsadmin reconfigto reconfigure LIM.- Run
badmin mbdrestartto restartmbatchd.
[ Top ]
Controlling Daemons
Prerequisites
To control all daemons in the cluster, you must:
- Be logged on as root or a user listed in the
/etc/lsf.sudoersfileSee the Platform LSF Reference for configuration details of
lsf.sudoers.- Be able to run the
rshorsshcommands across all LSF hosts without having to enter a password.See your operating system documentation for information about configuring the
rshandsshcommands.The shell command specified by LSF_RSH in
lsf.confis used beforershis tried.Daemon commands
The following is an overview of commands you use to control LSF daemons.
sbatchd
Restarting
sbatchdon a host does not affect jobs that are running on that host.If
sbatchdis shut down, the host is not available to run new jobs. Existing jobs running on that host continue, but the results are not sent to the user untilsbatchdis restarted.LIM and RES
Jobs running on the host are not affected by restarting the daemons.
If a daemon is not responding to network connections,
lsadmindisplays an error message with the host name. In this case you must kill and restart the daemon manually.If the LIM on the current master host is shut down, another host automatically takes over as master.
If the RES is shut down while remote interactive tasks are running on the host, the running tasks continue but no new tasks are accepted.
[ Top ]
Controlling mbatchd
When you reconfigure the cluster with the command
badmin reconfig,mbatchdis not restarted. Only configuration files are reloaded.If you add a host to a host group, or a host to a queue, the new host is not recognized by jobs that were submitted before you reconfigured. If you want the new host to be recognized, you must restart
mbatchd.Restarting mbatchd
Run
badmin mbdrestart. LSF checks configuration files for errors and prints the results tostderr. If no errors are found, the following occurs:
- Configuration files are reloaded.
mbatchdis restarted.- Events in
lsb.eventsare reread and replayed to recover the running state of the lastmbatchd.Whenever
mbatchdis restarted, it is unavailable to service requests. In large clusters where there are many events inlsb.events, restartingmbatchdcan take some time. To avoid replaying events inlsb.events, use the commandbadmin reconfig.Logging a comment when restarting mbatchd
Use the
-Coption ofbadmin mbdrestartto log an administrator comment inlsb.events. For example,%badmin mbdrestart -C "Configuration change"The comment text
Configuration changeis recorded inlsb.events.Use
badmin historbadmin mbdhistto display administrator comments formbatchdrestart.Shutting down mbatchd
- Run
badmin hshutdownto shut downsbatchdon the master host. For example:%badmin hshutdown hostDShut down slave batch daemon on <hostD> .... done- Run
badmin mbdrestart:%badmin mbdrestartChecking configuration files ... No errors found.This causes
mbatchdandmbschdto exit.mbatchdcannot be restarted, becausesbatchdis shut down. All LSF services are temporarily unavailable, but existing jobs are not affected. Whenmbatchdis later started bysbatchd, its previous status is restored from the event log file and job scheduling continues.[ Top ]
Reconfiguring Your Cluster
After changing LSF configuration files, you must tell LSF to reread the files to update the configuration. The commands you can use to reconfigure a cluster are:
The reconfiguration commands you use depend on which files you change in LSF. The following table is a quick reference.
Reconfiguring the cluster with lsadmin and badmin
- Log on to the host as
rootor the LSF administrator.- Run
lsadmin reconfigto reconfigure LIM:%lsadmin reconfigChecking configuration files ... No errors found. Do you really want to restart LIMs on all hosts? [y/n]yRestart LIM on <hosta> ...... done Restart LIM on <hostc> ...... done Restart LIM on <hostd> ...... doneThe
lsadmin reconfigcommand checks for configuration errors.If no errors are found, you are asked to confirm that you want to restart
limon all hosts andlimis reconfigured. If fatal errors are found, reconfiguration is aborted.- Run
badmin reconfigto reconfigurembatchd:%badmin reconfigChecking configuration files ... No errors found. Do you want to reconfigure? [y/n]yReconfiguration initiatedThe
badmin reconfigcommand checks for configuration errors.If no fatal errors are found, you are asked to confirm reconfiguration. If fatal errors are found, reconfiguration is aborted.
Reconfiguring the cluster by restarting mbatchd
Run
badmin mbdrestartto restartmbatchd:% badmin mbdrestartChecking configuration files ... No errors found. Do you want to restart? [y/n]yMBD restart initiatedThe
badmin mbdrestartcommand checks for configuration errors.If no fatal errors are found, you are asked to confirm
mbatchdrestart. If fatal errors are found, the command exits without taking any action.
If thelsb.eventsfile is large, or many jobs are running, restartingmbatchdcan take some time. In addition,mbatchdis not available to service requests while it is restarted.
Viewing configuration errors
You can view configuration errors by using the following commands:
This reports all errors to your terminal.
How reconfiguring the cluster affects licenses
If the license server goes down, LSF can continue to operate for a period of time until it attempts to renew licenses.
Reconfiguring causes LSF to renew licenses. If no license server is available, LSF will not reconfigure the system because the system would lose all its licenses and stop working.
If you have multiple license servers, reconfiguration will proceed as long as LSF can contact at least one license server. In this case, LSF will still lose the licenses on servers that are down, so LSF may have fewer licenses available after reconfiguration.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: January 12, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.