Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Engineering Research Compute Cluster (ENGR Cluster) is a medium-scale installation of heterogeneous compute nodes, supporting both common computational tasks, specific Engineering software applications, and specialized nodes for specific research group tasks.

Excluding faculty/lab owned equipment, the structure of the ENGR cluster is as below, broken down into resources by queue.

Compute nodes are organized into queues into which you submit jobs. Some special queues exist for particular labs or administrative purposes, and are not included here, nor are resources that are in generic queues but locked to specific lab populations.

See The LSF Scheduler for more information on managing jobs.

QueuePurposeInventoryNotes
normalVNC & Batch Jobs
  • 480 CPU Cores, avg. 8GB RAM per core
  • 16 NVIDIAGeForceGTX1080Ti (2 Nodes)

  • 4 NVIDIAGeForceGTXTITAN
  • 8 TeslaK40m
    (above GPus split over 3 nodes)

The K40 and TITAN GPUs are recommended for VNC use only, and may not be supported by current computational libraries.

  • 2 Nodes: (2x) 36 Core Xeon 62xx, 384GB RAM
  • 2 Nodes: (2x) 36 Core Xeon 61xx, 384GB RAM
  • 4 Nodes: (2x) 16 Core Xeon E5-v3, 256GB RAM
  • 1 Node: (2x) 20 Core Xeon E5-v3, 128GB RAM
  • 2 Nodes: (2x) 24 Core Xeon E5-v2, 256GB RAM
  • 8 Nodes: (2x) 24 Core Xeon E5-v2,  128GB RAM
  • 1 Node: (2x) 12 Core Xeon E5-v2, 128GB RAM
  • 1 Node: (2x) 16 Core Xeon E5, 128GB RAM
interactiveInteractive JobsSee Above
cpu-compute-*Batch Jobs
  • 1024 CPU Cores, avg 4GB RAM/core
  • 100GB IB Interconnectivity
  • 4 Nodes: (2x) 64 Core AMD Epyc 77xx, 512GB RAM
  • 4 Nodes: (2x) 64 Core AMD Epyc 77xx, 1024GB RAM

Three queues are defined:

cpu-compute : 7 day job time limit

cpu-compute-long: 21 day job time limit

cpu-compute-debug: 4 hour time limit, allows interactive jobs

gpu-compute-*Batch Jobs
  • 12 NVIDIAA100 80GB SXM4 (3 Nodes)
  • 8 NVIDIAA100 80GB PCIe (1 Node)
  • 16 NVIDIAA40 48GB (2 Nodes)
  • 768GB+ RAM Per Node
  • 100GB IB Interconnectivity
  • 3 Nodes: (2x) 24 Core Xeon 62xx, 768GB RAM (A100 PCI, A40)
  • 3 Nodes: (2x) 32 Core AMD Epyc 75xx, 1TB RAM (A100 SXM4)

Three queues are defined:

gpu-compute : 7 day job time limit

gpu-compute-long: 21 day job time limit

gpu-compute-debug: 4 hour time limit, allows interactive jobs

linuxlabInteractive & Batch Jobs
  • 256 CPU Cores, avg 4GB RAM per core
  • 8 Nodes: (2x) Xeon E5-v3, 128GB RAM

Accessing the Research Compute Cluster

...

virtualgl glxgears

Jupyter Notebooks

Custom iPython Kernels

The Jupyter notebooks start with the Anaconda used if you execute ‘module add seas-anaconda3’ in a terminal. From there you can build a custom Anaconda environment.

It’s not recommended to have Anaconda add itself to your .bashrc if you use VNC, as it interferes with the ability for the VNC environment to start.{: style=”color: red; opacity: 0.80;” }

Inside the new Anaconda environment, you can then execute

ipython kernel install --user --name=envname

Start a new Jupyter session, and you can then find that kernel from the ‘New’ dropdown within Jupyter.

Extra Jupyter Arguments

You may place extra arguments for Jupyter's startup in this field. Common options include:

--NotebookApp.notebook_dir=/storage1/piname/Active

The above starts Jupyter in a specific directory. You must have a keytab established, as described above, for this to work on RIS storage locations.


Include Page
VSCode

...

IDE (OnDemand)
VSCode IDE (OnDemand)


Batch Jobs

Batch jobs are the most efficient way to perform computations on the cluster - you can submit a job script file, which will then run on a compute node that meets your requirements. It runs unassisted, without needing monitoring.




Software on the Compute Cluster

...