Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

About

The Engineering Research Compute Cluster (ENGR Cluster) is a medium-scale installation of heterogeneous compute nodes, supporting both common computational tasks, specific Engineering software applications, and specialized nodes for specific research group tasks.

Excluding faculty/lab owned equipment, the structure of the ENGR cluster is as below, broken down into resources by queue.

Compute nodes are organized into queues into which you submit jobs. Some special queues exist for particular labs or administrative purposes, and are not included here, nor are resources that are in generic queues but locked to specific lab populations.

See The LSF Scheduler for more information on managing jobs.

QueuePurposeInventoryNotes
normalVNC & Batch Jobs
  • 480 CPU Cores, avg. 8GB RAM per core
  • 16 NVIDIAGeForceGTX1080Ti (2 Nodes)

  • 4 NVIDIAGeForceGTXTITAN
  • 8 TeslaK40m
    (above GPus split over 3 nodes)

The K40 and TITAN GPUs are recommended for VNC use only, and may not be supported by current computational libraries.

  • 2 Nodes: (2x) 36 Core Xeon 62xx, 384GB RAM
  • 2 Nodes: (2x) 36 Core Xeon 61xx, 384GB RAM
  • 4 Nodes: (2x) 16 Core Xeon E5-v3, 256GB RAM
  • 1 Node: (2x) 20 Core Xeon E5-v3, 128GB RAM
  • 2 Nodes: (2x) 24 Core Xeon E5-v2, 256GB RAM
  • 8 Nodes: (2x) 24 Core Xeon E5-v2,  128GB RAM
  • 1 Node: (2x) 12 Core Xeon E5-v2, 128GB RAM
  • 1 Node: (2x) 16 Core Xeon E5, 128GB RAM
interactiveInteractive JobsSee Above
cpu-compute-*Batch Jobs
  • 1024 CPU Cores, avg 4GB RAM/core
  • 100GB IB Interconnectivity
  • 4 Nodes: (2x) 64 Core AMD Epyc 77xx, 512GB RAM
  • 4 Nodes: (2x) 64 Core AMD Epyc 77xx, 1024GB RAM

Three queues are defined:

cpu-compute : 7 day job time limit

cpu-compute-long: 21 day job time limit

cpu-compute-debug: 4 hour time limit, allows interactive jobs

gpu-compute-*Batch Jobs
  • 12 NVIDIAA100 80GB SXM4 (3 Nodes)
  • 8 NVIDIAA100 80GB PCIe (1 Node)
  • 16 NVIDIAA40 48GB (2 Nodes)
  • 768GB+ RAM Per Node
  • 100GB IB Interconnectivity
  • 3 Nodes: (2x) 24 Core Xeon 62xx, 768GB RAM (A100 PCI, A40)
  • 3 Nodes: (2x) 32 Core AMD Epyc 75xx, 1TB RAM (A100 SXM4)

Three queues are defined:

gpu-compute : 7 day job time limit

gpu-compute-long: 21 day job time limit

gpu-compute-debug: 4 hour time limit, allows interactive jobs

linuxlabInteractive & Batch Jobs
  • 256 CPU Cores, avg 4GB RAM per core
  • 8 Nodes: (2x) Xeon E5-v3, 128GB RAM

Accessing the Research Compute Cluster

All password prompts for logging into systems in ENGR Cluster is with your WUSTL Key.

The ENGR Cluster can be accessed via SSH via these hosts:

ssh.engr.wustl.edu
ssh2.engr.wustl.edu

or the Open Ondemand interface at:

https://compute.engr.wustl.edu

WUSTL Key Issues

If you are having issues with your WUSTL Key logging into the above systems, make sure you can log into another WUSTL-key'd service, like SIS. If you cannot, call 314-933-3333 for support any time of day.

If you cannot specifically SSH to the hosts above, your IP may have been blocked by automatic systems. Try the other host listed, or, join the WUSTL VPN (https://it.wustl.edu/items/connect/) to gain a new unblocked IP.

You can check what IP you are coming from at the website http://whatismyip.com, and if you feel it is blocked, try the alternate methods above, and email that IP address to support@seas.wustl.edu with a note of when you last tried logging in, and which host you were logging in to.

RIS Storage Access

Warning
RIS Storage access is governed by specific requirements - please read!

RIS storage is accessed at the path

/storage1/piname

where "piname" is the WUSTL Key login of the PI or storage owner. 

Access is granted via WUSTL Key - on the ENGR cluster, this translates to having a valid Kerberos ticket. If you’ve SSH’d into a ENGR host with your WUSTL Key, you will have a valid ticket. If you have logged in with an SSH Key, you will not.

To generate or refresh a Kerberos ticket, use the command

kinit

and enter your WUSTL Key when prompted.


Warning

Kerberos tickets in the ENGR cluster have these properties:

  • From the initial password entry, Kerberos tickets last for 10 hours
  • Tickets on ssh.seas.wustl.edu or ssh2.seas.wustl.edu will not refresh automatically after 10 hours.
  • Inside a submitted batch job, tickets will refresh for 7 days from their original creation/refresh time. This means that if you log in to ssh.seas.wustl.edu on Monday, and submit a job Wednesday, the ticket only has 5 days of life left!
  • When your Kerberos ticket expires, your access to RIS storage will break.


If you habitually leave live connections to the terminal machines, you may want to get into the habit of “kinit”ing your ticket before submitting a job.

Long Jobs With RIS Storage

If you are using RIS storage with your job, and the ticket expires, this will break the job. In order to avoid this, you can generate a keytab file that allows the cluster to renew your Kerberos ticket for much longer - this keytab will last until you change your WUSTL Key password, at which point it must be regenerated.

Note

Instructions for generating keytabs are on the cluster website at https://compute.engr.wustl.edu

VNC Sessions, Jupyter Notebooks, VSCode IDEs and RIS Storage

At the current time, VNC jobs do not come pre-armed with your Kerberos key from your initial login to the web service. If you have created a keytab as described above, it will initialize when the VNC session starts. Otherwise, if you wish to access the RIS storage from inside a VNC session, please do

kinit

You will be prompted for your WUSTL Key. You should also do this before submitting a LSF job from inside a VNC session. After you authenticate, you will have an active Kerberos key and will be able to access the RIS storage mounts.

Open OnDemand

The OpenOnDemand interface is at https://compute.engr.wustl.edu - log in with your WUSTL Key, using Duo 2FA.

Files

OpenOndemand provides a file browser interface. Please note at this time, the file browser cannot access RIS storage.

Within the file brwoser you can upload, download, and move files. You can also edit plain text files within the browser, or open the current location in a web-based terminal.

VNC Desktops

There are several VNC sessions available. Many are marked with a specific PI’s name, and are only accessible to users within that PI’s lab.

Cluster Desktop - Virtual GL

This will start on one of three GPU hosts with older cards, specifically set up to allow you to tie your VNC session to a GPU in order to allow GUI applications to correctly render using the GPU.

When you start your GUI application, prepend the command with ‘virtualgl’ - for example

virtualgl glxgears

Jupyter Notebooks

Custom iPython Kernels

The Jupyter notebooks start with the Anaconda used if you execute ‘module add seas-anaconda3’ in a terminal. From there you can build a custom Anaconda environment.

It’s not recommended to have Anaconda add itself to your .bashrc if you use VNC, as it interferes with the ability for the VNC environment to start.{: style=”color: red; opacity: 0.80;” }

Inside the new Anaconda environment, you can then execute

ipython kernel install --user --name=envname

Start a new Jupyter session, and you can then find that kernel from the ‘New’ dropdown within Jupyter.

Extra Jupyter Arguments

You may place extra arguments for Jupyter's startup in this field. Common options include:

--NotebookApp.notebook_dir=/storage1/piname/Active

The above starts Jupyter in a specific directory. You must have a keytab established, as described above, for this to work on RIS storage locations.


Include Page
VSCode IDE (OnDemand)
VSCode IDE (OnDemand)


Batch Jobs

Batch jobs are the most efficient way to perform computations on the cluster - you can submit a job script file, which will then run on a compute node that meets your requirements. It runs unassisted, without needing monitoring.




Software on the Compute Cluster

Filter by label (Content by label)
cqllabel in ( "software" , "batch" , "ondemand" ) and space = currentSpace ( )