Compute2 General Guidelines
Basics
Slurm puts a lot of hardware at your fingertips, but it must be used wisely so you don’t affect others’ work. This page contains guidelines for using Slurm.
Don’t submit 1000 jobs until you’ve seen 1 job finish successfully.
Job submission has overhead. Avoid submitting jobs that complete in just seconds. If you can’t avoid it, look into job arrays.
Limit the total number of files you place in a single directory. On the order of thousands at most. Normal filesystem operations become unwieldy with directories of millions of files.
Every Slurm jobs gets its own temporary directory that gets cleaned up for you! Do your work there, if possible. The path is:
/tmp/.Don’t monopolize queues with large numbers of long-running (multi-day) jobs. Use QOS to limit your running jobs if necessary.
Use
srunfor interactive jobs andsbatchfor batch jobs.Don’t rely on the host environment to develop or install your software.
Create your own environment using container technology, or
Encapsulate all your dependencies
General Partition Default Job Resources
The resources listed below are the defaults for running jobs in the general partitions. These may not be efficient for user jobs.
Users are expected to be knowledgeable of their analysis and ask for resources efficiently.
Per Job Default Parameters
--cpus-per-task = 1
--mem-per-cpu = 4GB
--time = 8 hours
Partition/Queue Configuration
Partition | Priority | Partition Limits | Cost | Description |
|---|---|---|---|---|
workshop | High | Run Time Limit: 24 hours Max Jobs: 16 Max CPU Cores: 8 Max Memory: 64 GB Max GPUs: 1 MIG ( 1/7 H100 GPU) | No costs | Shared pool of hosts for workshop jobs. It has higher priority than all other general partitions which share hosts with this partition. To ensure the availability of resources for workshop partition, we will use Slurm reservation feature to reserve resources for a specific period for workshop participants. |
general-interactive | High | Run Time Limit: 5 days Max Jobs: 2 Max CPU Cores: 8 Max Memory: 64 GB Max GPUs: 1 MIG ( 1/7 H100 GPU) | Low Costs per CPU and GPU per hour | Shared pool of CPU and GPU hosts for interactive jobs for entry-level users and developers. NoVNC and OOD Desktop provide a virtual environment for developers to replace local workstations. |
general-short | High | Run Time Limit: 30 minutes Max Jobs: 16 Max CPU Cores: 218 Max Memory: 1744 GB | Low Costs per CPU and GPU per hour | Shared pool of hosts for jobs are less than or equal to 30 minutes. Could resource could be leveraged to backfill. |
general-gpu | Normal | Run Time Limit: 15 days Max GPUs: 8 Max Memory per GPU: 80 GB | Low Costs per CPU and GPU per hour | Shared pool of hosts for GPU jobs. It has higher priority of general-short which share hosts with this partition. |
general-cpu | Normal | Run Time Limit: 15 days Max Jobs: 100 Max CPU Cores: 128 Max Memory: 2 TB | Low Costs per CPU and GPU per hour | Shared pool of hosts for general use. |
general-bigmem | Normal | Run Time Limit: 15 days Max CPU Cores: 128 Max Memory: 8 TB | c2-bigmem-* | Shared pool of hosts for general use. |
general-preemptible-cpu | Low | Run Time Limit: 15 days Grace Time: 0 Minutes | No costs | Pool of host shared from the subscription and condo partitions. Jobs will be terminated when owners/subscribers need these resources. |
general-preemptible-gpu | Low | Run Time Limit: 15 days Grace Time: 0 Minutes | No costs | Pool of host shared from the subscription and condo partitions. Jobs will be terminated when owners/subscribers need these resources. |
condo-$name | Normal | QOS per group
| Pay for server + operational fee | Purchase servers and receive a dedicated partition named for PI lab, department or school. Optionally include general pool of resources for easier scheduling. Custom configurations can be made for condo partitions like include general resources for to augment condo resources without the need to use a different partition. Condo owners can opt into sharing resources in the preemptible queue. |
Unless indicated differently, max resource limits are per user, across all jobs, per partition.
Cost Calculator
There is a calculator that has been developed for users to use to estimate the cost of jobs.
This can be downloaded here: Cost Calculator
Storage Types
Storage Type | Description | Size | Performance | Storage Persistence | Globus Accessible | SMB Accessible | Across Nodes |
|---|---|---|---|---|---|---|---|
Active Storage | Persistent active storage, such as storage1, storage2, and storage3. The path format is /storageN/fs1/<allocation name>. Replace the | 5TB to many Petabytes. | High | Yes | Yes | Yes | Yes |
Home Directory | User home directory, for example, /home/<washukey> | Default is 50GB | High | Yes | No | No | Yes |
Scratch Storage | Temporary scratch storage, such as scratch2. The path format is /scratch2/fs1/<allocation name> | Default is 10TB | Higher | No (The system deletes data older than 30 days) | No | No | Yes |
Local Storage | Local storage devices, such as /tmp | Less than 10TB normally | Highest | No | No | No | No |