FAQ

FAQ

storageN

  • The use of storageN within these documents indicates that any storage platform can be used.

  • Current available storage platforms:

    • storage1

    • storage2

    • storage3

General

RIS is the Research Infrastructure Services team within Wash U IT. We are a young group, having been incorporated into Wash U IT in 2018. The RIS team strives to build and deliver services related to the research mission of Wash U IT. To date, our service catalog only includes Research Storage and will soon include Research Computing. Our product roadmap includes Application Platforms. Generally speaking, RIS attempts to find common patterns in the needs of Wash U research faculty and develops services that addresses the needs these represent. We do not solve all research IT problems, rather we attempt to reduce the costs of the “big ticket items” that represent significant challenges to researchers: high performance and high capacity storage, high performance and high throughput computing, large scale data transfer, and some narrowly focused research applications.

RIS uses the RIS Service Desk to interact with users. Each category of the Service Desk includes an “Ask a question” related to that service. This is not the same thing as Service Now, which is seen by some to be a deficiency and by others to be an advantage. We will be working to integrate with Wash U IT and have some linkage between our systems to make it easier for users to contact us.

RIS services are described in the RIS Service Catalog.

RIS services are for Wash U faculty and staff, focused on the research mission of Wash U.

It is assumed that users will be research faculty and staff and have WashU Key IDs, have access to Wash U IT Networks, and have access to a departmental WorkDay Cost Center Number.

With these as pre-requisites, one can get started with our onboarding services by filling out the form here: Get Onboarded to RIS

Below is a guide on how to use the Service Desk to request new RIS services.

There are a few ways that users can get help with RIS Services.

  • The documentation, especially the FAQ (where you currently are).

  • The chatbot that is part of the RIS Service Desk. It has access our documentation and can provide assistance.

  • Submitting a ticket via the RIS Service Desk.

You are reading it! RIS service documentation lives at this Confluence site in two places:

These descriptions are how they relate to RIS.

Early Access

Early-access features are limited to a closed group of testers for a limited subset of launches. Participation is by invitation only. These features may be unstable, change in backward-incompatible ways, and are not guaranteed to be released. There are no SLAs provided and no technical support obligations.

Alpha

Alpha is a limited-availability test before releases are cleared for more widespread use. Our focus with alpha testing is to verify functionality and gather feedback from a limited set of customers. Typically, alpha participation is by invitation and subject to pre-general-availability terms. Alpha releases don’t have to be feature complete, no SLAs are provided, and there are no technical support obligations. However, alphas are generally suitable for use in test environments.

Beta

At beta, products or features are ready for broader customer testing and use. There are no SLAs or technical support obligations in a beta release unless otherwise specified in product terms or the terms of a particular beta program.

General Availability

General availability products and features are open to all customers, covered by the RIS SLAs, and are ready for production use.

Deprecated

Deprecated features are scheduled to be shut down and removed.

Note: Depending on product maturity and engineering needs, a RIS product or feature may not go through every launch stage, and the time between launch phases may vary.

  • The Research Datacenter is physically located at:

    222 S Newstead Ave.

    St. Louis, MO 63110

  • You can take a virtual tour of the data center

  • If you are looking to collaborate with someone outside WashU, you will need to have a WashU guest account created for the user.

  • Please use this link for starting this process: https://connect.wustl.edu/guest/guestrequest/

  • Once you have the guest account setup through that process, you can create a ticket in our Service Desk and we can get the user added to the appropriate allocations.

Scientific Compute Platforms

  • We suggest you make use of SSH keys to log into the compute clients, see SSH Key-Pair Setup.

  • See our Quick Start guide for submitting your first job. Further information can be found elsewhere in this documentation for more complex examples.

  • Faculty members may purchase dedicated hardware for their labs to form what we refer to as a “condominium”. In this model, a “condo” is formed out of a set of hardware that we put into a Host Group.

  • Then we create a Queue/Partition named after the Lab. E.g. for Compute1: labname and labname-interactive.

  • Then we create an AD group named compute-labname and populate it with Users. That group then gets priority access to that lab.

  • Yes. The “general” and “interactive” job queues are serviced by a set of execution nodes in a Host Group named general and general-interactive on Compute1 and general-cpu, general-gpu, general-interactive, general-short, general-preempt-cpu, and general-preempt-gpu on Compute2.

  • The general queue runs batch jobs much like the traditional HPC setting. They run in the background in the queue system.

  • The general queue also makes use of cache system, which you can learn more about here.

  • Jobs in the general queue can run for up to 28 days.

  • The general-interactive queue runs jobs interactively so that you can interact directly with them or watch a job.

  • The general-interactive queue does not use the cache system and instead interfaces with the Storage Platform directly.

  • Jobs in the general-interactive queue can run for up to 24 hours.

  • Please see the general queue policies for more information.

  • Colocation facilities worthy of hosting production quality computing hardware, datacenter space

  • Power and cooling of the physical space

  • Physical security

  • Identity Managment: User accounts, groups, access controls and permissions

  • Execution nodes: Varying by CPU flavor, speed, RAM quantity, local hard drive space, etc.

  • Networking: All of the above for networking systems

  • Storage: All of the above for storage systems

  • Data security: Operating system and software updates, incident response

  • Integration: Interconnects that provide appropriate bandwidth and Input/Ouput operations per second

  • Integration with Cloud Services

  • Integration with storage tiers, tape libraries, tape robots, data movers

  • Integration with data movement, specialized technologies like Globus

  • Operations: Monitoring, alerting, event response

  • Support: Help when things go wrong

  • Compute job scheduling

  • Software development, software artifact repositories

  • Container management

  • Professional staffing: Specialists in all of the above

  • More…

Compute1

  • $HOME directories are limited to 10GB. If you wish to observe your quota, you can use the following command:

    mmlsquota --block-size auto -u washukey rdcw-fs2:home1
image-20250214-165737.png
  • Under the Block Limits portion ‘blocks’ is how much of the 10Gb that you have consumed.

Compute2

  • $HOME directories are limited to 50GB.

  • Users can monitor their home directory space with the following command.

    $ df -h /home/elyn Filesystem Size Used Avail Use% Mounted on home2.ris.wustl.edu:/home2-fs1 9.4G 1.4G 8.1G 15% /home
  • User $HOME directories are intended to allow space for users to make use of the compute platforms, with the knowledge that the Storage Platforms is where data and software will be stored.

  • The $HOME directory is NOT backed up and important data should NOT be stored here. Anything you wish to be backed up should be placed in /storageN, this includes scripts.

  • The $HOME directory is required for the Compute Platform(s) to function for users and software often rely on it.

  • Policy dictates that users are limited to 10G on Compute1.

  • Policy dictates that users are limited to 50G on Compute2.

  • You can use the following command to list out the top 10 (or any number if you replace the 10) files or directories using the most space in your $HOME directory.

  • Make sure the following command is run from your $HOME directory.

du -hsx .[^.]* * 2>/dev/null | sort -rh | head -10
  • Expected example output.

800M .vscode-server 140M .local 95M work 68M .cache 41M .lsbatch 24M .nv 21M .matlab 20M .npm 20M .config 15M ondemand
  • This error typically refers to the ability of the job to write a file to a directory.

  • The most common source of the error is a user’s home directory being full.

  • If you encounter this error, please follow the steps below.

  • Requesting more resources for your job means using options that are part of the bsub command. You can find out more information about the bsub options at the following link.

  • Be aware that if the software you use requires special options in order to use these resources, you will need to include those options in your software command as well.

  • The first reason this could be happening, is port conflicts.

    • If your job lands on a node that has a job already using the port you are attempting to, you will not be able to connect.

    • You can attempt to launch your job on a new node, or you can change the port you’re using and launch the job again.

  • The second reason this could be happening, is that some department based VPNs are not part of the trusted network that will allow this.

  • If you wish to avoid dealing with ports for GUI based software, you can check out what software we have available through Open on Demand.

  • You can also use port forwarding to get around the second reason for being unable to connect.

We strive to provide help with software debugging and support to the best of our abilities and time. With that being said, there may be times when we cannot solve an issue related to a specific piece of software or script that is not supported by RIS. In those cases, we will attempt to provide a solution to the problem, but we cannot guarantee that the solution will be successful. We recommend reading this section for more help debugging your software as well as for guidance on software development best practices.

Data Storage Platforms

Many different types of research data can be stored in the Storage Platforms. If your data is eligible to be stored there, please see our information on what types of data are qualified.

https://washu.atlassian.net/wiki/spaces/RUD/pages/1795325985/Storage+Platforms#Types-of-Data-That-Can-Be-Stored-in-the-Storage-Platforms

  • The Compute Service is connected to the Storage Service via POSIX filesystem mounts.

    • The batch (execution) nodes and condos are connected via cache.

    • The client and interactive nodes are connected directly.

  • The Storage Service provides the SMB interface at smb://storageN.ris.wustl.edu/${STORAGE_ALLOCATION}.

    • You can observe available space via SMB mounts with a df command on the mounting workstation.

    • This is for all current storage platforms.

    • This is also the method to use in regards to Storage2 on the Compute Platforms.

df --output -h /storage2/fs1/${STORAGE_ALLOCATION}
  • The Compute Platforms provide a POSIX interface via the filesystem path /storageN/fs1/${STORAGE_ALLOCATION}.

    • You can observe available space by the mmlsquota command while logged into the compute platform.

    • This is for the Storage1 Platform.

mmlsquota --block-size auto -j washukey_active rdcw-fs1
image-20250214-170156.png
  • Again, under the Block Limits section, the ‘blocks’ portion is how much you have consumed.
    The Compute Service uses a caching interface to access the data. Read more about how
    this affects usage and quota here: cache interfaces

If you are experiencing issues maintaining a stable connection to your storage allocation, please visit the storage service troubleshooting page.

  • The RIS team supplies a Speedtest application that will report the IP address of
    the browsing computer. Visit the Speedtest URL.

At the time of this writing, you can access storage service Allocations via:

  • SMB mounts from MacOS, Linux, and Windows.

  • Globus Data Transfer endpoints.

  • The Compute Platforms.