Learn how to create, submit, and monitor jobs.-> Read this note before submitting parallel jobs.
The cluster is designed to run computationally intensive jobs on compute nodes. Running resource intensive jobs on the login nodes is not permitted. The user must submit a batch script or request an interactive session via the job scheduler. Henry2 uses the LSF resource management software for scheduling jobs.
A user supplies a list of resource requirements to LSF using a text file, called a batch script, or via the command line. It is necessary to specify the number of cores and the time required. Various resource requirements may be added, (e.g. memory, processor model, processor instruction, GPU).
Sample batch scripts should be examined and modified as appropriate before submitting production jobs to LSF.
The following shows the contents of a text file called run_mycode.csh, which is a basic LSF batch script to run a serial application
#!/bin/tcsh #BSUB -n 1 #BSUB -W 120 #BSUB -J mycode #BSUB -o stdout.%J #BSUB -e stderr.%J mycode.exeThe #BSUB directives take the place of specifying options to bsub via the command line. In this example, one core (-n 1) is being requested for a maximum of 120 minutes (-W 120). If the application were to run longer than 120 minutes, LSF would automatically terminate the job.
The -J directive is the name that LSF will display for the job. This provides a way to differentiate multiple jobs running at same time.
The other two directives tell LSF where to put standard output and standard error messages from the job. The %J will be replaced by the job ID when the job begins. The files will be written to the directory the job is submitted from (the current working directory when the bsub command is invoked).
The following shows the contents of a text file called run_my_parallel_code.csh, which is a basic LSF batch script to run an application
#!/bin/tcsh #BSUB -n 4 #BSUB -W 120 #BSUB -q shared_memory #BSUB -J mycode #BSUB -o stdout.%J #BSUB -e stderr.%J my_parallel_code.exeIn this example, four processor cores (-n 4) are being requested. Specifying the shared_memory queue will ensure the 4 cores are on a single node. Another way to ensure the cores are on one node is to use #BSUB -R span[hosts=1]. Use hosts=1 if the run should go to the debug queue, otherwise use shared_memory. Verify that an application can be run in distributed memory (across multiple nodes) before leaving out this resource specification. For more information on this, see the FAQ.
There are many ways to specify job parameters and resources, including memory requirements, processor model, supported instruction sets, and use of a GPU.
To submit a job to a compute node, the batch script run_mycode.csh is submitted to LSF
using the command
bsub < run_mycode.csh.
Production jobs should always be run via a batch script. It may be necessary to interact with an application at the command line to determine how to properly write the script. Users may not do this testing on a login node.
To interact with a compute node via the command line, submit a request to LSF by using bsub -Is, other LSF parameters such as time and processors needed, followed by tcsh. To request an interactive session using 4 cores, with all cores on the same node, and 10 minutes of wall clock time:
bsub -Is -n 4 -R "span[hosts=1]" -W 10 tcsh
Do not request more than one core for a serial job. For serial jobs, choose -n 1, and if the memory requirements are high, specify the memory required or request exclusive use of the node by using -x, e.g.,
bsub -Is -n 1 -x -W 10 tcsh
Interactive sessions must be kept to a minimum and only used when necessary. Nodes left idle or underutilized by long running interactive sessions may be terminated, as this violates the Acceptable Use Policy.
The bjobs command is used to monitor the status of jobs after they are submitted to LSF. An LSF job status is usually in one of two states (STAT): PEND means that the job is queued and waiting for resources to become available and RUN means that the job is currently executing. Typical bjobs output looks like:
[unityID@login01 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 948851 unityID RUN standard login01.hpc bc2j6 myjob1 Mar 25 16:15
The first column shows the job ID.
bjobs -l JOBID
will provide detailed information about the job. For jobs in a pending state, this
will give information about why the job is pending. It is possible to make
resource requests that are impossible to satisfy; jobs that have been
pending for a long time may have made that type of request. For guidance on interpreting
the output of bjobs, see the LSF FAQ.
bmod -W "[new time]" [job ID]. See the IBM LSF documentation for more details.
A pending job may be removed from the queue or a running job may be terminated
by using the bkill command. The job ID is used to specify
which job to remove.
To terminate all running and pending jobs, use
The bqueues command provides an overall view of LSF and some indication of how busy various queues may be. The following shows that in the long queue, there are a maximum of 200 job slots (or tasks) available. Of the 161 tasks submitted, 129 are running and 32 are pending.
[unityID@login04 ~]$ bqueues long QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP long 45 Open:Active 200 32 - - 161 32 129 0
The cluster status page provides information about available nodes and can help inform resource requests.
#!/bin/tcsh #BSUB -n 32 #BSUB -J test1_hydro #BSUB -W 2:30 #BSUB -R "select[avx2]" #BSUB -o test1_hydro.out.%J #BSUB -e test1_hydro.err.%J #BSUB -q standard_ib module load PrgEnv-intel mpirun ./hydro.exe
This will run an MPI code called hydro.exe. It will launch 32 MPI tasks and is not expected to run longer than 150 minutes. To ensure all nodes are of the same type (standard) and the network used is InfiniBand (ib), the standard_ib queue is specified. The code was compiled with Intel MPI, and the runtime environment must be set to match the compile environment. It was also compiled with optimizations for AVX2 instruction set architecture; it may not run on older nodes.
#!/bin/tcsh #BSUB -n 6 # Number of MPI tasks #BSUB -R span[ptile=2] # MPI processes per node #BSUB -x # Exclusive use of nodes #BSUB -J chemtest1 # Name of job #BSUB -W 2:30 # Wall clock time #BSUB -o chemtest1.out.%J # Standard out #BSUB -e chemtest1.err.%J # Standard error #BSUB -q standard_ib # Queue module load openmpi-gcc # Set environment setenv OMP_NUM_THREADS 4 mpirun ./chemtest1.exe
This will run a hybrid MPI-OpenMP code called chemtest1.exe. It will launch 6 MPI tasks and is not expected to run longer than 150 minutes. Two tasks (span[ptile=2]) will be placed on each of 3 nodes. Four threads will be spawned for each MPI task, so each node will need to have a minimum of 8 cores. LSF schedules by the number of tasks requested, so -x must be used to prevent contention with other jobs or overloading the node. To ensure all nodes are of the same type (standard) and the network used is InfiniBand (ib), the standard_ib queue is specified. The code was compiled with GNU OpenMPI, and the runtime environment must be set to match the compile environment.
#!/bin/tcsh #BSUB -n 1 #BSUB -W 30 #BSUB -q gpu #BSUB -gpu "num=1:mode=exclusive_process:mps=yes" #BSUB -o out.%J #BSUB -e err.%J module load PrgEnv-pgi module load cuda ./nnetworks.exe
This will run a CUDA code called nnetworks.exe. It will use one core and is not expected to run longer than 30 minutes. Using the gpu queue will place the job on a GPU node, and the -gpu specifier will allow the use of one GPU on the node (num=1). The code was compiled using PGI and the CUDA libraries, and the runtime environment must be set to match the compile environment.