Using the Clusters

Cluster layout

Like many supercomputer facilities, our clusters each have a single head node and many compute nodes. The head node hosts the shared HOME directories, runs the job scheduling software, and provides login access to the cluster. The compute nodes provide the CPUs, GPUs, memory, and disk space for running jobs. Jobs are submitted to specific queues that are managed by the scheduler. More specific requirements, such as number of CPU cores and amount of RAM, are also specified when submitting a job to any of the queues.

 

Queues

Available queues

Sunbird has a single queue which accepts all classroom-related jobs.
Starling has a single queue which accepts all GPU jobs.
Phoenix has the follow queues available for research jobs:

  • short queue → For short jobs taking less than 30 minutes and using a maximum of 2 CPUS.
  • default queue → All other jobs. There is where you will most likely submit your jobs.

There are also several dedicated queues that allow certain groups to submit jobs to reserved nodes.

Queue Limits

On phoenix, there is a per-user simultaneous core usage limit. Please see the welcome banner when you connect to phoenix via ssh for the current limit. This limit is usually in the neighborhood of 400 cores per person.

The phoenix cluster is a mixture of nodes that have between 4GB and 6GB per core of RAM available. Please keep these limits in mind when formulating your input.

GPUs on starling

Starling nodes typically have 4 GPU cards per node. GPUs are NVIDIA 1080Ti or RTX 2080 cards.

 

Software Available

Please see the list of software installed for general availability.

You are also encouraged to install software you wish to run on the cluster to your home directory.

 

Submitting jobs

From the web

When using sunbird for classroom calculations or to run Gaussian jobs without using ssh on phoenix, log into the web interface to submit Gaussian jobs using WebMO.

The information below is for command-line access.

This is an accordion element with a series of buttons that open and close related content panels.

Phoenix and Sunbird

Most software packages installed on phoenix and sunbird have a submission script that make submitting input for that program simple. These scripts are in /usr/local/q_scripts. Using these scripts is the easiest way to submit jobs to the cluster.

For example, Gaussian16 jobs can be submitted to the cluster using the qg16 command:

qg16 inputfile

Each script has several options, such as number of cores to run across and which queue to submit to. To run a 2 core CFOUR job in the short queue you would use the following command:

qcfour -q short -np 2 jobname.cfour

Usage options for each script can be found by running the script without any arguments.

Running other software

You can use the script at /usr/local/q_scripts/qblank as a template to create a submission script for almost any piece of software that runs non-interactively (doesn’t require keyboard input at the time it is run).

Follow the comments in the qblank script to insert the commands to run your software.

Please make sure you use local scratch space for files created during calculation. If you have questions about this please email clusteradmin@chem.wisc.edu to setup a brief meeting to learn how to do this properly.

Interactive sessions

Any time you want to run software directly at the command line for the sake of providing interactive input at runtime or other reasons, you should use an interactive session on a compute node.

You can start an interactive shell session on a compute node with the following commands:

qsub -I

or

qsub -I -q short

Starling

The starling cluster is still in its early phases of use. Interactive jobs requesting use of the GPUs available can be executed with the srun command.

For example, request an interactive job with access to 4 GPU cards with this command:

srun --gres gpu:4 --pty bash

Guidelines and Policies

Scratch space usage

While your HOME directory is identical across all the nodes, scratch space (/scratch) is unique on each node. Using your HOME directory as scratch space for a job will slow your jobs and may slow the jobs of others or make the head node unstable. DO NOT USE YOUR HOME DIRECTORY FOR SCRATCH. If you are not using the predefined submission scripts please make proper use of /scratch for intermediate file creation during job runs.

Disk quotas

On phoenix, each user of the cluster is limited to 100GB of disk space in /home. Home directories should primarily be used for job submission preparation and temporary result storage. It is not a permanent repository for completed research. If your research requires more than 100GB for job submission preparation and temporary result storage, please email clusteradmin@chem.wisc.edu to explore the possibility of an increase.

Backups

Due to the constantly ever-changing content of the /home filesystem, no global backups of the HOME directories are made. All cluster filesystems use fault-tolerant disk arrays so the likelyhood of data loss due to mechanical failure is very low. You are responsible for backing up your own data. Files cannot be restored if you delete or overwrite your own files.

 

Other useful information

Requesting nodes with large local scratch space

Each phoenix node has between 250GB and 440GB of local scratch space. You can add -bigdisk to most of the submission scripts to request nodes with local scratch spaces of 1.8TB.

Scheduler Commands

This is an accordion element with a series of buttons that open and close related content panels.

Phoenix and Sunbird

qstat (Check queue status)
qstat -u username (check queue status for jobs belonging to username)
qdel JOBID (remove your job with job number JOBID from the queue)

Starling

squeue (Check queue status)
squeue –user=username (check queue status for jobs belonging to username)
scancel JOBID (remove your job with job number JOBID from the queue)

Charging job to a different group or fund than your default

Add the -charge [group,fund number] argument if you are using the provided submission scripts. Please provide both a group and a fund. Example:

q909 -charge schmidt,PRJ1234 input.com

If you are writing your own submission scripts and using qsub, add custom group and funding strings using the -A flag with qsub. Example:

qsub -A schmidt,PRJ1234 script.pbs
Receiving email from the scheduler

To receive email regarding your jobs (errors, etc) from the cluster scheduler, make a .forward file in your /home directory containing your email address.

Understanding your PATH

Your PATH is an environmental variable in that tells the system which directories to search for executable files in response to commands you issue.

If you want to run a program that is not in your PATH you must call it using its full location, like /share/apps/myprograms/myprogram instead of just myprogram. You can also modify your PATH to search other locations not included in your default PATH. Some commands:

Display your PATH:

echo $PATH

Add /my/dir to your PATH:

export PATH=$PATH:/my/dir
Cluster Specifications

Phoenix currently has approximately 1900 Intel-based CPU cores all connected via Infiniband which allows for high-speed, low latency communications between each of the compute nodes. Depending on the compute nodes that are tasked for a specific job, between 4GB and 6GB of RAM is available per core.

Starling currently has a growing number of GPU nodes with a mix of NVIDIA 1080Ti and RTX 2080 GPU cards.

Sunbird currently has approximately 200 Intel-based CPU cores available for use in the classroom.

 

Other computing resources available

You can request allocations on other HPC/HTC resources available on campus and around the world. Here are a few: