Skip to end of banner
Go to start of banner

Research Compute Cluster

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

About

The Engineering Research Compute Cluster (ENGR Cluster) is a medium-scale installation of heterogeneous compute nodes, supporting both common computational tasks, specific Engineering software applications, and specialized nodes for specific research group tasks.

Excluding faculty/lab owned equipment, the structure of the ENGR cluster is as below, broken down into resources by queue.

Compute nodes are organized into queues into which you submit jobs. Some special queues exist for particular labs or administrative purposes, and are not included here, nor are resources that are in generic queues but locked to specific lab populations.

See The LSF Scheduler for more information on managing jobs.

QueuePurposeInventoryNotes
normalVNC & Batch Jobs
  • 480 CPU Cores, avg. 8GB RAM per core
    • 4 Nodes : 36 Cores
    • 10 Nodes: 24 Cores
    • 6 Nodes: 16 Cores
    • 1 Nodes: 12 Cores
  • 8 NVIDIAGeForceGTX1080Ti (1 Node)
  • 4 NVIDIAGeForceRTX2080 (1 Node)

  • 4 NVIDIAGeForceGTXTITAN
  • 8 TeslaK40m
    (above GPus split over 3 nodes)
The K40 and TITAN GPUs are recommended for VNC use only, and may not be supported by current computational libraries.
interactiveInteractive JobsSee Above
cpu-compute-*Batch Jobs
  • 512 CPU Cores
    • 4 Nodes: 128 Cores, 4GB RAM/core
  • 100GB IB Interconnectivity

Three queues are defined:

cpu-compute : 7 day job time limit

cpu-compute-long: 21 day job time limit

cpu-compute-debug: 4 hour time limit, allows interactive jobs

gpu-compute-*Batch Jobs
  • 8 NVIDIAA100 80GB PCIe (1 Node)
  • 16 NVIDIAA40 48GB (2 Nodes)
  • 768GB RAM Per Node
  • 100GB IB Interconnectivity

Three queues are defined:

gpu-compute : 7 day job time limit

gpu-compute-long: 21 day job time limit

gpu-compute-debug: 4 hour time limit, allows interactive jobs

linuxlabInteractive & Batch Jobs
  • 224 CPU Cores, avg 4GB RAM per core

Accessing the Research Compute Cluster

All password prompts for logging into systems in ENGR Cluster is with your WUSTL Key.

The ENGR Cluster can be accessed via SSH via these hosts:

ssh.engr.wustl.edu
ssh2.engr.wustl.edu

or the Open Ondemand interface at:

https://compute.engr.wustl.edu

WUSTL Key Issues

If you are having issues with your WUSTL Key logging into the above systems, make sure you can log into another WUSTL-key'd service, like SIS. If you cannot, call 314-933-3333 for support any time of day.

If you cannot specifically SSH to the hosts above, your IP may have been blocked by automatic systems. Try the other host listed, or, join the WUSTL VPN (https://it.wustl.edu/items/connect/) to gain a new unblocked IP.

You can check what IP you are coming from at the website http://whatismyip.com, and if you feel it is blocked, try the alternate methods above, and email that IP address to support@seas.wustl.edu with a note of when you last tried logging in, and which host you were logging in to.

RIS Storage Access

RIS Storage access is governed by specific requirements - please read!

RIS storage is accessed at the path

/storage1/piname

where "piname" is the WUSTL Key login of the PI or storage owner. 

Access is granted via WUSTL Key - on the ENGR cluster, this translates to having a valid Kerberos ticket. If you’ve SSH’d into a ENGR host with your WUSTL Key, you will have a valid ticket. If you have logged in with an SSH Key, you will not.

To generate or refresh a Kerberos ticket, use the command

kinit

and enter your WUSTL Key when prompted.


Kerberos tickets in the ENGR cluster have these properties:

  • From the initial password entry, Kerberos tickets last for 10 hours
  • Tickets on ssh.seas.wustl.edu or ssh2.seas.wustl.edu will not refresh automatically after 10 hours.
  • Inside a submitted batch job, tickets will refresh for 7 days from their original creation/refresh time. This means that if you log in to ssh.seas.wustl.edu on Monday, and submit a job Wednesday, the ticket only has 5 days of life left!
  • When your Kerberos ticket expires, your access to RIS storage will break.


If you habitually leave live connections to the terminal machines, you may want to get into the habit of “kinit”ing your ticket before submitting a job.

Long Jobs With RIS Storage

If you are using RIS storage with your job, and the ticket expires, this will break the job. In order to avoid this, you can generate a keytab file that allows the cluster to renew your Kerberos ticket for much longer - this keytab will last until you change your WUSTL Key password, at which point it must be regenerated.

Instructions for generating keytabs are on the cluster website at https://compute.engr.wustl.edu

VNC Sessions, Jupyter Notebooks, VSCode IDEs and RIS Storage

At the current time, VNC jobs do not come pre-armed with your Kerberos key from your initial login to the web service. If you have created a keytab as described above, it will initialize when the VNC session starts. Otherwise, if you wish to access the RIS storage from inside a VNC session, please do

kinit

You will be prompted for your WUSTL Key. You should also do this before submitting a LSF job from inside a VNC session. After you authenticate, you will have an active Kerberos key and will be able to access the RIS storage mounts.

Open OnDemand

The OpenOnDemand interface is at https://compute.engr.wustl.edu - log in with your WUSTL Key, using Duo 2FA.

Files

OpenOndemand provides a file browser interface. Please note at this time, the file browser cannot access RIS storage.

Within the file brwoser you can upload, download, and move files. You can also edit plain text files within the browser, or open the current location in a web-based terminal.

VNC Desktops

There are several VNC sessions available. Many are marked with a specific PI’s name, and are only accessible to users within that PI’s lab.

Cluster Desktop - Virtual GL

This will start on one of three GPU hosts with older cards, specifically set up to allow you to tie your VNC session to a GPU in order to allow GUI applications to correctly render using the GPU.

When you start your GUI application, prepend the command with ‘virtualgl’ - for example

virtualgl glxgears

Jupyter Notebooks

Custom iPython Kernels

The Jupyter notebooks start with the Anaconda used if you execute ‘module add seas-anaconda3’ in a terminal. From there you can build a custom Anaconda environment.

It’s not recommended to have Anaconda add itself to your .bashrc if you use VNC, as it interferes with the ability for the VNC environment to start.{: style=”color: red; opacity: 0.80;” }

Inside the new Anaconda environment, you can then execute

ipython kernel install --user --name=envname

Start a new Jupyter session, and you can then find that kernel from the ‘New’ dropdown within Jupyter.

Extra Jupyter Arguments

You may place extra arguments for Jupyter's startup in this field. Common options include:

--NotebookApp.notebook_dir=/storage1/piname/Active

The above starts Jupyter in a specific directory. You must have a keytab established, as described above, for this to work on RIS storage locations.


VSCode IDE

This OnDemand application starts a VSCode IDE on a cluster node in your browser.

The Working Directory field allow you to start your IDE in a specific working directory, defaulting to your home directory.

Research Compute Cluster & RIS Storage

If you wish to access a PI’s RIS storage as your starting working directory, you must have a keytab set as described on the landing page of the Research Compute cluster website (https://compute.engr.wustl.edu)


Batch Jobs

Batch jobs are the most efficient way to perform computations on the cluster - you can submit a job script file, which will then run on a compute node that meets your requirements. It runs unassisted, without needing monitoring.




Software on the Compute Cluster

  • No labels