FAQ
storageN
The use of
storageNwithin these documents indicates that any storage platform can be used.Current available storage platforms:
storage1
storage2
storage3
General
RIS is the Research Infrastructure Services team within Wash U IT. We are a young group, having been incorporated into Wash U IT in 2018. The RIS team strives to build and deliver services related to the research mission of Wash U IT. To date, our service catalog only includes Research Storage and will soon include Research Computing. Our product roadmap includes Application Platforms. Generally speaking, RIS attempts to find common patterns in the needs of Wash U research faculty and develops services that addresses the needs these represent. We do not solve all research IT problems, rather we attempt to reduce the costs of the “big ticket items” that represent significant challenges to researchers: high performance and high capacity storage, high performance and high throughput computing, large scale data transfer, and some narrowly focused research applications.
RIS uses the RIS Service Desk to interact with users. Each category of the Service Desk includes an “Ask a question” related to that service. This is not the same thing as Service Now, which is seen by some to be a deficiency and by others to be an advantage. We will be working to integrate with Wash U IT and have some linkage between our systems to make it easier for users to contact us.
RIS services are described in the RIS Service Catalog.
RIS services are for Wash U faculty and staff, focused on the research mission of Wash U.
It is assumed that users will be research faculty and staff and have WashU Key IDs, have access to Wash U IT Networks, and have access to a departmental WorkDay Cost Center Number.
With these as pre-requisites, one can get started with our onboarding services by filling out the form here: Get Onboarded to RIS
Below is a guide on how to use the Service Desk to request new RIS services.
There are a few ways that users can get help with RIS Services.
The documentation, especially the FAQ (where you currently are).
The chatbot that is part of the RIS Service Desk. It has access our documentation and can provide assistance.
Submitting a ticket via the RIS Service Desk.
You are reading it! RIS service documentation lives at this Confluence site in two places:
These descriptions are how they relate to RIS.
Early Access
Early-access features are limited to a closed group of testers for a limited subset of launches. Participation is by invitation only. These features may be unstable, change in backward-incompatible ways, and are not guaranteed to be released. There are no SLAs provided and no technical support obligations.
Alpha
Alpha is a limited-availability test before releases are cleared for more widespread use. Our focus with alpha testing is to verify functionality and gather feedback from a limited set of customers. Typically, alpha participation is by invitation and subject to pre-general-availability terms. Alpha releases don’t have to be feature complete, no SLAs are provided, and there are no technical support obligations. However, alphas are generally suitable for use in test environments.
Beta
At beta, products or features are ready for broader customer testing and use. There are no SLAs or technical support obligations in a beta release unless otherwise specified in product terms or the terms of a particular beta program.
General Availability
General availability products and features are open to all customers, covered by the RIS SLAs, and are ready for production use.
Deprecated
Deprecated features are scheduled to be shut down and removed.
Note: Depending on product maturity and engineering needs, a RIS product or feature may not go through every launch stage, and the time between launch phases may vary.
The Research Datacenter is physically located at:
222 S Newstead Ave.
St. Louis, MO 63110
You can take a virtual tour of the data center
Accounts for RIS Compute Services use WashU Key credentials, so you must obtain one of those before you can get an account with RIS Compute Services:
See this page for help with WashU Keys.
Since RIS Compute Services uses WashU Key credentials you must change your password for that. Wash U IT has documentation on resetting WashU Key passwords:
If you are looking to collaborate with someone outside WashU, you will need to have a WashU guest account created for the user.
Please use this link for starting this process: https://connect.wustl.edu/guest/guestrequest/
Once you have the guest account setup through that process, you can create a ticket in our Service Desk and we can get the user added to the appropriate allocations.
Scientific Compute Platforms
See our Quick Start guide on how to get connected.
We suggest you make use of SSH keys to log into the compute clients, see SSH Key-Pair Setup.
See our Quick Start guide for submitting your first job. Further information can be found elsewhere in this documentation for more complex examples.
There is documentation on best practices for file naming available in our documentation. You can find that information at the following links.
Faculty members may purchase dedicated hardware for their labs to form what we refer to as a “condominium”. In this model, a “condo” is formed out of a set of hardware that we put into a Host Group.
Then we create a Queue/Partition named after the Lab. E.g. for Compute1: labname and labname-interactive.
Then we create an AD group named compute-labname and populate it with Users. That group then gets priority access to that lab.
Yes. The “general” and “interactive” job queues are serviced by a set of execution nodes in a Host Group named
generalandgeneral-interactiveon Compute1 andgeneral-cpu,general-gpu,general-interactive,general-short,general-preempt-cpu, andgeneral-preempt-gpuon Compute2.
The general queue runs batch jobs much like the traditional HPC setting. They run in the background in the queue system.
The general queue also makes use of cache system, which you can learn more about here.
Jobs in the general queue can run for up to 28 days.
The general-interactive queue runs jobs interactively so that you can interact directly with them or watch a job.
The general-interactive queue does not use the cache system and instead interfaces with the Storage Platform directly.
Jobs in the general-interactive queue can run for up to 24 hours.
Please see the general queue policies for more information.
Colocation facilities worthy of hosting production quality computing hardware, datacenter space
Power and cooling of the physical space
Physical security
Identity Managment: User accounts, groups, access controls and permissions
Execution nodes: Varying by CPU flavor, speed, RAM quantity, local hard drive space, etc.
Networking: All of the above for networking systems
Storage: All of the above for storage systems
Data security: Operating system and software updates, incident response
Integration: Interconnects that provide appropriate bandwidth and Input/Ouput operations per second
Integration with Cloud Services
Integration with storage tiers, tape libraries, tape robots, data movers
Integration with data movement, specialized technologies like Globus
Operations: Monitoring, alerting, event response
Support: Help when things go wrong
Compute job scheduling
Software development, software artifact repositories
Container management
Professional staffing: Specialists in all of the above
More…
Compute1
$HOMEdirectories are limited to 10GB. If you wish to observe your quota, you can use the following command:mmlsquota --block-size auto -u washukey rdcw-fs2:home1
Under the Block Limits portion ‘blocks’ is how much of the 10Gb that you have consumed.
Compute2
$HOMEdirectories are limited to 50GB.Users can monitor their home directory space with the following command.
$ df -h /home/elyn Filesystem Size Used Avail Use% Mounted on home2.ris.wustl.edu:/home2-fs1 9.4G 1.4G 8.1G 15% /home
User
$HOMEdirectories are intended to allow space for users to make use of the compute platforms, with the knowledge that the Storage Platforms is where data and software will be stored.The
$HOMEdirectory is NOT backed up and important data should NOT be stored here. Anything you wish to be backed up should be placed in/storageN, this includes scripts.The
$HOMEdirectory is required for the Compute Platform(s) to function for users and software often rely on it.Policy dictates that users are limited to 10G on Compute1.
Policy dictates that users are limited to 50G on Compute2.
You can use the following command to list out the top 10 (or any number if you replace the 10) files or directories using the most space in your $HOME directory.
Make sure the following command is run from your $HOME directory.
du -hsx .[^.]* * 2>/dev/null | sort -rh | head -10Expected example output.
800M .vscode-server
140M .local
95M work
68M .cache
41M .lsbatch
24M .nv
21M .matlab
20M .npm
20M .config
15M ondemandThis error typically refers to the ability of the job to write a file to a directory.
The most common source of the error is a user’s home directory being full.
If you encounter this error, please follow the steps below.
Use the methods described in the home directory space section section to determine if the home directory is at cap.
Remove or move files from the home directory to reduce usage.
Attempt to run the job again.
If the problem persists, submit a ticket to the service desk: https://ris.wustl.edu/support/service-desk/
Requesting more resources for your job means using options that are part of the bsub command. You can find out more information about the bsub options at the following link.
Be aware that if the software you use requires special options in order to use these resources, you will need to include those options in your software command as well.
RIS offers RIS hosted and controlled Docker images.
You can find them here: https://washu.atlassian.net/wiki/spaces/RUD/pages/1705738300
RIS offers software through our THPC Docker image. Found here: https://washu.atlassian.net/wiki/spaces/RUD/pages/1782874484
You can find the list of software here: https://washu.atlassian.net/wiki/spaces/RUD/pages/1696039041
RIS also offers a list of non-RIS developed containers, where we do not control the Docker image nor host it.
You can find that list here: https://washu.atlassian.net/wiki/spaces/RUD/pages/1861451878
You can request help building a Docker image if you are having trouble via our ticketing system.
Software that is used frequently is taken into consideration when creating RIS hosted and controlled Docker images.
We currently do not have a public repository for users to host their own images in.
The first reason this could be happening, is port conflicts.
If your job lands on a node that has a job already using the port you are attempting to, you will not be able to connect.
You can attempt to launch your job on a new node, or you can change the port you’re using and launch the job again.
The second reason this could be happening, is that some department based VPNs are not part of the trusted network that will allow this.
Please see our VPN information for which VPNs we recommend.
If you wish to avoid dealing with ports for GUI based software, you can check out what software we have available through Open on Demand.
You can also use port forwarding to get around the second reason for being unable to connect.
We strive to provide help with software debugging and support to the best of our abilities and time. With that being said, there may be times when we cannot solve an issue related to a specific piece of software or script that is not supported by RIS. In those cases, we will attempt to provide a solution to the problem, but we cannot guarantee that the solution will be successful. We recommend reading this section for more help debugging your software as well as for guidance on software development best practices.
Data Storage Platforms
Many different types of research data can be stored in the Storage Platforms. If your data is eligible to be stored there, please see our information on what types of data are qualified.
The Compute Service is connected to the Storage Service via POSIX filesystem mounts.
The batch (execution) nodes and condos are connected via cache.
The client and interactive nodes are connected directly.
The Storage Service provides the SMB interface at
smb://storageN.ris.wustl.edu/${STORAGE_ALLOCATION}.You can observe available space via SMB mounts with a df command on the mounting workstation.
This is for all current storage platforms.
This is also the method to use in regards to Storage2 on the Compute Platforms.
df --output -h /storage2/fs1/${STORAGE_ALLOCATION}The Compute Platforms provide a POSIX interface via the filesystem path
/storageN/fs1/${STORAGE_ALLOCATION}.You can observe available space by the mmlsquota command while logged into the compute platform.
This is for the Storage1 Platform.
mmlsquota --block-size auto -j washukey_active rdcw-fs1Again, under the Block Limits section, the ‘blocks’ portion is how much you have consumed.
The Compute Service uses a caching interface to access the data. Read more about how
this affects usage and quota here: cache interfaces
You can request access be granted to your colleagues through our ticketing system.
You can also use collections within Globus to share specific folders or files with colleagues. This method is the suggested method when it comes to colleagues outside of WashU. You can find more information about using this feature here:
The first method we recommend is to use SMB mounts. You can find more information about connecting at the following link.
Our suggested method of transferring data if SMB is not an option is to make use of Globus. You can use Globus in multiple ways. There are links to our Globus documentation below.
If you are experiencing issues maintaining a stable connection to your storage allocation, please visit the storage service troubleshooting page.
The RIS team supplies a Speedtest application that will report the IP address of
the browsing computer. Visit the Speedtest URL.
At the time of this writing, you can access storage service Allocations via:
SMB mounts from MacOS, Linux, and Windows.
Globus Data Transfer endpoints.
The Compute Platforms.