C2 THPC

C2 THPC

The following document will briefly detail how to consume modules and software built on and for the compute2 platform. There are currently two primary approaches:

  1. Open OnDemand

  2. SLURM

Open OnDemand

Open OnDemand is the RIS preferred graphical frontend to interacting with the compute2 SLURM cluster. Through it, interactive and bash SLURM jobs can be submitted either through selecting and configuring a pre-defined application or simply running custom code via the Terminal. Such jobs could run within the standard C2-THPC container or directly on the host(s).

This is the current RIS preferred method for starting interactive jobs for most common users.

  1. C2-OOD Quick-Start Guide

SLURM

When using SLURM you are presented with two options.

  1. Running the software bare metal, directly on the host, using sbatch or srun

    1. considered the most performant option

    2. the RIS default and recommended approach whenever possible

  2. Running the software within a “docker” container via pyxis within an sbatch or srun job

    1. negligible performance penalty

    2. flexible and powerful allowing remapping of bind mounts like in compute1

    3. only officially supported image is the RockyLinux based C2-THPC images here and any derivatives

      1. images updated to RockyLinux 9.x along with hosts

      2. users are requested to use C2-THPC as their base when building new Docker images

Bare Metal

  1. Connect to a compute2 login node

    ssh c2-login-001.ris.wustl.edu
  2. Launch an interactive SLURM job directly on a host.

    srun \ --nodes=1 \ --ntasks=1 \ --cpus-per-task=8 \ --mem=16GB \ --partition=general \ --pty /bin/bash
  3. Load the RIS domain module.

    ml ris
  4. Review newly available software, loading all that is required

    ml av ml <app1> <app2> <app3>
    1. Unsure where to find a software/module? Try module spider!

      ml spider <app>
    2. See what software/modules are loaded. Slurm is loaded by default.

      ml

Container

  1. Connect to a compute2 login node

    ssh c2-login-001.ris.wustl.edu
  2. Launch an interactive SLURM job that will land on a host and then start a container.

    # Define mount points to pass into the container # $HOME and the workdir are added by default THPC_CONTAINER_MOUNTS='/cm:/cm,/opt/thpc:/opt/thpc,/storage2/fs1:/storage2/fs1' RIS_STORAGE_MOUNTS='/storage2/fs1:/storage2/fs1,/scratch2/fs1:/scratch2/fs1,/storage1/fs1:/storage1/fs1' SYSTEM_LMOD_MOUNTS='/etc/profile.d:/etc/profile.d,/etc/sysconfig/modules:/etc/sysconfig/modules' CONTAINER_MOUNTS="${THPC_CONTAINER_MOUNTS},${RIS_STORAGE_MOUNTS},${SYSTEM_LMOD_MOUNTS}" srun \ --container-image=ghcr.io#washu-it-ris/ris-thpc:rocky9.2 \ --container-mounts=${CONTAINER_MOUNTS} \ --nodes=1 \ --ntasks=1 \ --cpus-per-task=8 \ --mem=16GB \ --partition=general \ --pty /bin/bash
    1. There are a few “gotchas” when using a container

      1. You have to define all the mounts you wish to see in the container. Just like compute1.

      2. The syntax for the container image is different than on compute1

        1. ghcr.io/washu-it-ris/ris-thpc:rocky9.2 => ghcr.io#washu-it-ris/ris-thpc:rocky9.2

          1. note the change of the first / into a #

      3. slurm commands are currently not working inside these jobs - WIP

        1. sinfo, srun, scontrol, etc

This is the only container that RIS directly supports for C2 at this time.

  1. Load the RIS domain module.

    ml ris
  2. Review newly available software, loading all that is required

    ml av ml <app1> <app2> <app3>
    1. Unsure where to find a software/module? Try module spider!

      ml spider <app>

The spider command will only work for applications that are part of a environment when it is loaded.

E.g. Modules available in the ris environment will only show up once it has been loaded.

module load ris
  1. See what software/modules are loaded. Slurm is loaded by default.

    ml