Cluster layout
Like many supercomputer facilities, our clusters each have a single head node and many compute nodes. The head node hosts the shared HOME directories, runs the job scheduling software, and provides login access to the cluster. The compute nodes provide the CPUs, GPUs, memory, and disk space for running jobs. Jobs are submitted to specific queues that are managed by the scheduler. More specific requirements, such as number of CPU cores and amount of RAM, are also specified when submitting a job to any of the queues.
Queues
Kestrel has the following queues available for research jobs:
- short queue → For short jobs taking less than 30 minutes and using a maximum of 2 CPUS.
- default queue → All other jobs. There is where you will most likely submit your jobs.
Starling has the following queues available for research jobs:
- gpudefault queue → All regular jobs.
- specific group queue → Some research groups have reserved-access queues.
Owl has a single queue which accepts all classroom-related jobs.
Software Available
Please see the list of software installed for general availability.
You are also encouraged to install software you wish to run on the cluster to your home directory.
Submitting jobs
Most software packages installed on kestrel have a submission script that make submitting input for that program simple. These scripts are in /usr/local/q_scripts. Using these scripts is the easiest way to submit jobs to the cluster.
For example, Gaussian16 jobs can be submitted to the cluster using the qg16 command:
qg16 inputfile
Each script has several options, such as number of cores to run across and which queue to submit to. To run a 2 core CFOUR job in the short queue you would use the following command:
qcfour -q short -np 2 jobname.cfour
Usage options for each script can be found by running the script without any arguments.
Running other software
You can use the script at /usr/local/q_scripts/qblank on kestrel as a template to create a submission script for almost any piece of software that runs non-interactively (doesn’t require keyboard input at the time it is run).
Follow the comments in the qblank script to insert the commands to run your software.
Please make sure you use local scratch space for files created during calculation. If you have questions about this please email clusteradmin@chem.wisc.edu to setup a brief meeting to learn how to do this properly.
Interactive sessions
Any time you want to run software directly at the command line for the sake of providing interactive input at runtime or other reasons, you should use an interactive session on a compute node.
You can start an interactive shell session on a compute node with the following commands:
qsub -I -l:nodes=1:ppn=NUMBEROFCORES
or
qsub -I -q short
Other scheduler commands
qstat (Check queue status)
qstat -u username (check queue status for jobs belonging to username)
qdel JOBID (remove your job with job number JOBID from the queue)
Interactive jobs requesting use of the GPUs available can be executed with the srun command.
For example, request an interactive job with access to 4 GPU cards with this command:
srun --gres gpu:4 --pty bash
Other scheduler commands
squeue (Check queue status)
squeue –user=username (check queue status for jobs belonging to username)
scancel JOBID (remove your job with job number JOBID from the queue)
Most software packages installed on owl have a submission script that make submitting input for that program simple. These scripts are in /usr/local/q_scripts. Using these scripts is the easiest way to submit jobs to the cluster.
For example, Gaussian16 jobs can be submitted to the cluster using the qg16 command:
qg16 inputfile
Some scripts have several options, such as number of cores to use. To run a 2 core CFOUR job you would use the following command:
qcfour -np 2 jobname.cfour
Usage options for each script can be found by running the script without any arguments.
Interactive sessions
Any time you want to run software directly at the command line for the sake of providing interactive input at runtime or other reasons, you should use an interactive session on a compute node.
You can start an interactive shell session on a compute node with the following commands:
srun -J JOBNAME -c=NUMBEROFCORES --pty bash
Other scheduler commands
squeue (Check queue status)
squeue –user=username (check queue status for jobs belonging to username)
scancel JOBID (remove your job with job number JOBID from the queue)
Guidelines, Policies, and Limits
Queue limits
Per-user simultaneous core usage limits are displayed when you connect to each of the clusters via ssh. This is typically 200-400 cores. Clusters other than owl, which has a 96 hour time limit for classroom jobs submitted, do not have a time limit.
Disk quotas
On kestrel each user is limited to 1TB of disk space in /home. Home directories should primarily be used for job submission preparation and temporary result storage. It is not a permanent repository for completed research.
Scratch space usage
While your HOME directory is identical across all the nodes, scratch space (/scratch) is unique on each node. Using your HOME directory as scratch space for a job will slow your jobs and may slow the jobs of others or make the head node unstable. DO NOT USE YOUR HOME DIRECTORY FOR SCRATCH. If you are not using the predefined submission scripts please make proper use of /scratch for intermediate file creation during job runs.
Other useful information
Copying files from one cluster to another
Use the scp command to copy files from one cluster system to another.
Usage for the scp command:
scp [options] file_or_folder_name destination_cluster:location
When copying directories, you must add “-r” to the [options] part of the command to recursively copy the directory contents.
Examples: (These are copying files and folders to kestrel, so the command is run from your home directory on the remote system)
scp -r myfolder/ kestrel:/home/username
scp myfile kestrel:/home/username/project/notes
Backups
Due to the constantly ever-changing content of the /home filesystem, no global backups of the HOME directories are made. All cluster filesystems use fault-tolerant disk arrays so the likelihood of data loss due to mechanical failure is very low. You are responsible for backing up your own data. Files cannot be restored if you delete or overwrite your own files.
Receiving email from the scheduler
To receive email regarding your jobs (errors, etc) from the cluster scheduler, make a .forward file in your /home directory containing your email address.
Understanding your PATH
Your PATH is an environmental variable in that tells the system which directories to search for executable files in response to commands you issue.
If you want to run a program that is not in your PATH you must call it using its full location, like /share/apps/myprograms/myprogram instead of just myprogram. You can also modify your PATH to search other locations not included in your default PATH. Some commands:
Display your PATH:
echo $PATH
Add /my/dir to your PATH:
export PATH=$PATH:/my/dir
Other computing resources available
You can request allocations on other HPC/HTC resources available on campus and around the world. Here are a few: