OverviewAnchor id1 id1
AssumptionsAnchor id2 id2
The Storage Service WorkflowAnchor id3 id3
Features And OptionsAnchor id4 id4
Storage Tiers: ArchiveAnchor id5 id5
SnapshotsAnchor id6 id6
Storage Tape BackupsAnchor id7 id7
Enabling The Storage ServiceAnchor id8 id8
Enabling the Archive TierAnchor id9 id9
Getting ConnectedAnchor id10 id10
How to connect to storage from MacOSAnchor id11 id11
How to connect to storage from LinuxAnchor id12 id12
How to connect to storage from WindowsAnchor id13 id13
Designing a Storage LayoutAnchor id14 id14
Moving Data Into The Storage ServiceAnchor id15 id15
CHPCAnchor id16 id16
GlobusAnchor id17 id17
Globus CLIAnchor id18 id18
Globus Connect PersonalAnchor id19 id19
RcloneAnchor id20 id20
gsutilAnchor id21 id21
Access ControlAnchor id22 id22
Known LimitationsAnchor id23 id23
Calculating Free SpaceAnchor id24 id24
Active Directory Group ManagementAnchor id25 id25
Ignoring umaskAnchor id26 id26
Security Implications of SMBAnchor id27 id27
Early Access, Design Changes, Implementation and IntegrationAnchor id28 id28
FAQAnchor id29 id29
Overview
This is the User Manual for the Wash U IT Research Information Services (RIS) Storage Service.
...
Product Stage: General Availability
Assumptions
If you are reading this document, it is assumed that you are a member of the Washington University user community, and that you are related to the research mission of the University. We assume you have a Wash U WUSTL Key Identity, and that you are or work for Research Faculty or Staff. We assume you are on local Wash U computer networks or that you have access to either the Wash U Medical School or Danforth VPN. (See How do I know what network I am on? in the FAQ below.)
...
The Storage Service Workflow
The summary of steps to enable and consume a RIS Storage Allocation is as follows:
You are a Wash U Research Faculty member, or a member of a research team.
You have a Wash U WorkDay Cost Center Number.
You have a WUSTL Key ID.
Visit the RIS Service Desk, the Storage Service section, and begin a Service Request for a new Allocation.
If you are a Faculty member requesting personal space, the Allocation Name will match your WUSTL Key ID.
If you represent a “Research Lab” you can name the allocation after your Lab Name. If so, indicate the name in your Service Request
Indicate the Storage Amount, this is a quota or size limit to the total requested amount of space.
Indicate members of an Access List, this is a list of WUSTL Key IDs that should have access to your storage.
Indicate if you intend to store Protected Data (eg. Personal Health Information, etc.) in your storage allocation.
Indicate your WorkDay Cost Center Number.
...
Optionally, consider sub-dividing your Allocation into Project Subdirectories. These are sub-units of your Allocation that might need different access controls. If you have different data sets that you would like to control access to, call those “Projects” and give them a name. Then indicate an Access List of WUSTL Key IDs that are to have read-write access, and optionally a separate Access List for read-only access.
Features And Options
There are a number of features and options related to the RIS Storage Service.
Integrated with WUSTL Key ID
Integrated with RIS Data Transfer (Globus)
Integrated with RIS Compute services (See this FAQ).
Snapshots
Archive Tier
Active Tier with seamless expansion
Storage Tiers: Archive
We use the word “tier” to refer to a “performance level” of the storage service. Currently there are two tiers, “Active” and “Archive”. The Active tier is the standard storage tier you get by default. It is serviced by a number of different storage pieces including fast memory caching etc., but the way an End User should think of it is “Active storage is where I do daily work”. Think of it like “spinning disks” even though it’s more complicated than “just” spinning disks.
...
RIS intends to expand tiers in the future to include a “local” tier, that is directly attached to a Compute Service execution node, a “cloud” tier that is connected to cloud services like AWS or GCP or Azure.
Snapshots
Within the “Active” storage tier there is a directory named “.snapshots” that contains one week of daily snapshots of the Active storage space. If files in your Active space get overwritten, corrupted, or mistakenly deleted, you can copy previous versions out of the .snapshots directory back into Active.
Storage Tape Backups
The backup policy for both Active and Archive data has been fully vetted and approved by Office of the Vice Chancellor for Research and the Office for Information Security.
The research storage infrastructure has been deemed compliant with all data retention guidelines.
Integrated into the storage environment is a high performance and scalable tape robot that manages a tape library of 18 petabytes, which allows the shuttling of data from live disk to much less expensive tape and back again on demand.
- For both Active and Archive filesets, data remains on tape indefinitely unless it has been deleted on disk. Active data remains on tape for 90 30 days after the data has been deleted from disk.
If the data is never deleted from disk, then it remains on tape indefinitely with incremental backups.
Data in Archive also remains on dual copy tape indefinitely unless it is deleted, then it remains on tape for 10 days.
The research storage environment also offers self-service, snapshot data recovery for 7 days.
The preferred method of completed project retention is to request an Archive allocation and once a project is completed, the data can be moved from Active to Archive.
If the data needs to be accessed again after moved to Archive it can migrated back to Active, and it will be restored from tape to disk.
The preferred method of moving data between Active and Archive is to use tar or zip the data and use rsync for movement.
Enabling The Storage Service
Visit the RIS Service Desk, then on the left, click the Storage Platform section, and begin a Service Request for a new Allocation by selecting Activate a new storage allocation.
If you are a Faculty member requesting personal space, the Allocation Name will match your WUSTL Key ID.
If you represent a “Research Lab” you can name the allocation after your Lab Name. If so, indicate the name in your Service Request
Indicate the Storage Amount, this is a quota or size limit to the total requested amount of space.
Indicate members of an Access List, this is a list of WUSTL Key IDs that should have access to your storage.
Indicate if you intend to store Protected Data (eg. Personal Health Information, etc.) in your storage allocation.
Indicate your WorkDay Cost Center Number.
Please see our documentation for more information on activating a storage allocation
Enabling the Archive Tier
Enabling the Archive tier is done simply by asking for it. Put in a Service Desk request and ask that the Archive tier be enabled for a named Storage Allocation.
Getting Connected
How to connect to storage from MacOS
How to connect to storage from Linux
How to connect to storage from Windows
Designing a Storage Layout
When you connect to your Storage Allocation, there is a standard filesystem layout:
...
Info |
---|
When creating directories or files, it is best practices to avoid using spaces within the name. If you need to separate parts of a name, it is highly recommended that you use dashes Linux environments do not handle spaces in names well and when it comes to interactions with the Compute Platform, spaces within names of directories and files create issues affecting operation. There is a 255 character limit on NTFS file name sizes. It is recommended that you be precise in your naming as well. This is a hard limit of the system that the Storage/Compute platform uses. Any files to be transferred to Storage/Compute need to be created following this limit or they cannot be transferred. |
Moving Data Into The Storage Service
Info | ||||
---|---|---|---|---|
| ||||
Please see our Compute Data Transfer Policy if you will be transferring data to and from your storage allocation using compute1. |
CHPC
Instructions for moving data from CHPC
Globus
Instructions for moving data with Globus:
Globus CLI
Instructions for moving data with Globus CLI:
Globus Connect Personal
Instructions for installing and using Globus Connect Personal:
Rclone
gsutil
Instructions for moving data from Google storage with gsutil
Access Control
Instructions for how to manage access to your data in the Storage Service.
Known Limitations
Anchor | ||||
---|---|---|---|---|
|
The Storage Service includes a feature set documented in these pages. Each feature or capability has limitations or caveats.
Calculating Free Space
Use SMB to determine free space in a Storage Service Allocation
...
Code Block | ||||
---|---|---|---|---|
| ||||
$ du -sh --apparent-size /storage1/fs1/ris/Active/ |
Active Directory Group Management
Members May Be Removed From Groups
...
Code Block | ||||
---|---|---|---|---|
| ||||
$ getent group storage-ris-itsm-rw storage-ris-itsm-rw:*:1250923:david.prince,shawn.m.leonard,dhallan,jansen,catherine.morie,tz-kai.lin,sleong,cspohl |
Ignoring umask
When any file or directory is created with an inherited Access Control Entry (ACE), the POSIX “umask” will be ignored. The umask normally determines basic traditional POSIX permissions on new files. By default, all folders in an allocation will have inherited heritable permissions, and thus display this behavior. In order to have the permissions on a new file reflect the setting of umask, files must be created in directories with ACLs modified to exclude inheritance flags or entries. The relevant vendor (IBM) and IETF (see NFS ACL RFCs) confirm this is the intended behavior. An example of where this might cause an issue is with the usage of git repositories containing permissions settings that conflict with the default ACLs.
Security Implications of SMB
Protocols like SMB evolve over time as a result of feature changes or security vulnerabilities. We expect users to use SMB3.