Using the Clusters

Cluster layout

Like many supercomputer facilities, our clusters each have a single head node and many compute nodes. The head node hosts the shared HOME directories, runs the job scheduling software, and provides login access to the cluster. The compute nodes provide the CPUs, GPUs, memory, and disk space for running jobs. Jobs are submitted to specific queues that are managed by the scheduler. More specific requirements, such as number of CPU cores and amount of RAM, are also specified when submitting a job to any of the queues.

Queues

Kestrel has the following queues available for research jobs:

  • short queue → For short jobs taking less than 30 minutes and using a maximum of 2 CPUS.
  • default queue → All other jobs. There is where you will most likely submit your jobs.
  • specific group queue → Some research groups have reserved-access queues.

Starling has the following queues available for research jobs:

  • gpudefault queue → All regular jobs.
  • specific group queue → Some research groups have reserved-access queues.

Sunbird has a single queue which accepts all classroom-related jobs.

Phoenix is the previous generation research cluster to be used to access group-owned compute nodes or to finish existing projects. It has the following queues available for research jobs:

  • default queue → All other jobs. There is where you will most likely submit your jobs.
  • specific group queue → Some research groups have reserved-access queues.

Software Available

Please see the list of software installed for general availability.

You are also encouraged to install software you wish to run on the cluster to your home directory.

Submitting jobs

Most software packages installed on kestrel and sunbird have a submission script that make submitting input for that program simple. These scripts are in /usr/local/q_scripts. Using these scripts is the easiest way to submit jobs to the cluster.

For example, Gaussian16 jobs can be submitted to the cluster using the qg16 command:

qg16 inputfile

Each script has several options, such as number of cores to run across and which queue to submit to. To run a 2 core CFOUR job in the short queue you would use the following command:

qcfour -q short -np 2 jobname.cfour

Usage options for each script can be found by running the script without any arguments.

Running other software

You can use the script at /usr/local/q_scripts/qblank on kestrel as a template to create a submission script for almost any piece of software that runs non-interactively (doesn’t require keyboard input at the time it is run).

Follow the comments in the qblank script to insert the commands to run your software.

Please make sure you use local scratch space for files created during calculation. If you have questions about this please email clusteradmin@chem.wisc.edu to setup a brief meeting to learn how to do this properly.

Interactive sessions

Any time you want to run software directly at the command line for the sake of providing interactive input at runtime or other reasons, you should use an interactive session on a compute node.

You can start an interactive shell session on a compute node with the following commands:

qsub -I -l:nodes=1:ppn=NUMBEROFCORES

or

qsub -I -q short
Other scheduler commands

qstat (Check queue status)
qstat -u username (check queue status for jobs belonging to username)
qdel JOBID (remove your job with job number JOBID from the queue)

Interactive jobs requesting use of the GPUs available can be executed with the srun command.

For example, request an interactive job with access to 4 GPU cards with this command:

srun --gres gpu:4 --pty bash
Other scheduler commands

squeue (Check queue status)
squeue –user=username (check queue status for jobs belonging to username)
scancel JOBID (remove your job with job number JOBID from the queue)

Guidelines, Policies, and Limits

Queue limits

Per-user simultaneous core usage limits are displayed when you connect to each of the clusters via ssh. This is typically 200-400 cores. Clusters other than sunbird, which has a 96 hour time limit for classroom jobs submitted, do not have a time limit.

Disk quotas

On kestrel each user is limited to 1TB of disk space in /home. Home directories should primarily be used for job submission preparation and temporary result storage. It is not a permanent repository for completed research.

Scratch space usage

While your HOME directory is identical across all the nodes, scratch space (/scratch) is unique on each node. Using your HOME directory as scratch space for a job will slow your jobs and may slow the jobs of others or make the head node unstable. DO NOT USE YOUR HOME DIRECTORY FOR SCRATCH. If you are not using the predefined submission scripts please make proper use of /scratch for intermediate file creation during job runs.

Other useful information

Copying files from one cluster to another

Use the scp command to copy files from one cluster system to another.
Usage for the scp command:

scp [options] file_or_folder_name destination_cluster:location

When copying directories, you must add “-r” to the [options] part of the command to recursively copy the directory contents.

Examples: (These are copying files and folders from phoenix to kestrel, so the command is run from your home directory on phoenix)

scp -r myfolder/ kestrel:/home/username
scp myfile kestrel:/home/username/project/notes
Backups

Due to the constantly ever-changing content of the /home filesystem, no global backups of the HOME directories are made. All cluster filesystems use fault-tolerant disk arrays so the likelihood of data loss due to mechanical failure is very low. You are responsible for backing up your own data. Files cannot be restored if you delete or overwrite your own files.

Charging job to a different group or fund than your default

Add the -charge [group,fund number] argument if you are using the provided submission scripts. Please provide both a group and a fund. Example:

qg16 -charge schmidt,PRJ1234 input.com

If you are writing your own submission scripts and using qsub, add custom group and funding strings using the -A flag with qsub. Example:

qsub -A schmidt,PRJ1234 script.pbs
Receiving email from the scheduler

To receive email regarding your jobs (errors, etc) from the cluster scheduler, make a .forward file in your /home directory containing your email address.

Understanding your PATH

Your PATH is an environmental variable in that tells the system which directories to search for executable files in response to commands you issue.

If you want to run a program that is not in your PATH you must call it using its full location, like /share/apps/myprograms/myprogram instead of just myprogram. You can also modify your PATH to search other locations not included in your default PATH. Some commands:

Display your PATH:

echo $PATH

Add /my/dir to your PATH:

export PATH=$PATH:/my/dir

Other computing resources available

You can request allocations on other HPC/HTC resources available on campus and around the world. Here are a few: