Compute: Space Management
Compute Data Transfer Policy
Supported methods to transfer data into and out of the Scientific Compute Platform are:
-
Submitting a job using a data transfer tool (e.g.
rsync
,wget
,curl
,scp
)-
Please do not use these data transfer tools directly on the compute1 client nodes. This type of activity can slow down the client node and negatively affect all users connected to the client nodes.
-
If you require assistance submitting jobs using these tools, please open a ticket at our Service Desk.
-
Storage Service Integration Points
In addition to the Storage Service SMB Interface to a given Allocation, the Compute Service reveals another three interfaces to the Storage Service. This brings the total number of different types of locations where RIS Storage Space can be consumed to four:
-
Storage Service Allocations
-
Storage Service Allocations’ Caches
-
Home directories in the Compute Platform
-
Scratch space directories in the Compute Platform
Each of these types of locations have different methods and policies for managing and inspecting usage. This helps balance the availability of space and performance with the capabilities of the resources that provide them.
Checking Storage Usage
Storage Service Allocations
An accurate report of a Storage Service Allocation’s space consumption can only be obtained through the Storage Service SMB Interface. This is a known limitation of the Storage Service.
SMB Inteface
The easiest way to do this is to mount the allocation to your desktop, right-click the mounted folder, and select the appropriate menu option for more information.
Alternatively, maximum and available space can be obtained with smbclient
:
$ smbclient -A .smb_creds -k //storage1.ris.wustl.edu/ris -c du 137438953472 blocks of size 1024. 135136394752 blocks available Total number of bytes: 14619
The “blocks of size 1024” figure is the limit of how much total space can be consumed, and the “blocks available” figure is how much of that limit is not consumed. [2]
These values can be converted to TiB [1]
$ bc -q scale=2 137438953472/1024^3 128.00 135136394752/1024^3 125.85
Or subtracted from each other to equate space used:
(137438953472-135136394752)/1024^3 2.14
Telling us that this Allocation is using 2.14 TiB out of a 128 TiB limit.
Cache Interface
The cache interface can be measured more simply using a shell on a Compute Platform execution node or condo:
$ df -Ph /storage1/fs1/ris Filesystem Size Used Avail Use% Mounted on cache2-fs1 128T 3.3T 125T 3% /storage1/fs1
Notice that this usage is over a TiB more than what SMB reported! That’s because it is a different location. Rewriting the same 1 TiB file with different data three times would consume 3 TiB on the cache, but ultimately only use 1 TiB on the Storage Service Allocation, where the data gets flushed for “permanent” storage.
Similarly, data written over SMB, the client (login) nodes, or the interactive nodes but not used by a Compute Platform execution node or condo, will not be pulled into the cache, possibly resulting in a cache interface usage that is lower than the underlying Storage Service Allocation’s actual usage.
Again, this discrepency is a known limitation of the Storage Service.
Because data can be modified from the execution node and condo cache interface, the SMB interface, the client (login) nodes, and the interactive nodes, the possibility of conflicts arises. Conflicts happen when a file is deleted via SMB before the same file has finished writing back to the home fileset from the cache, or if a file is modified at the same path from another source and cache before the cache has time to write the file back to the home fileset. The cache fileset will detect these conflicts and move the data into hidden directories for manual review, in a way to prevent accidental loss of data in flight. The data in these hidden directories counts towards the cache fileset usage.
If you encounter a conflict, contact the RIS Service Desk for assistance.
Compute Service Home Directories
Every Compute Service User is assigned a limit of 9 GiB of home directory space on the Compute Platform. This space is restricted at the user level, and can only be checked with the appropriate mmlsquota
command:
mmlsquota -u foouser cache1-fs1:home1
For example, the current logged in user’s home directory usage in automatically scaled units can be obtained like so:
$ mmlsquota -u $(id -nu) --block-size auto cache1-fs1:home1 Block Limits | File Limits Filesystem Fileset type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks cache1-fs1 home1 USR 1.37G 9G 10G 0 none | 5961 0 0 0 none cache1-gpfs-cluster.ris.wustl.edu
There is no SMB interface to this space, and df
reports space for the entire device, which is shared among all home directories.
Compute Service Scratch Space
High-performance Scratch Space is typically allocated for each lab as it is onboarded to the Compute Service. This space is restricted at the group level, which should represent an eponymous lab. Because it is a shared device like that for home directories, this usage must also be inspected with the appropriate mmlsquota
command, referencing a group name and group quota on the scratch device:
mmlsquota -g compute-foo scratch1-fs1
To see the usage of every compute group the current logged in user belongs to, in automatically scaled units, try something like:
$ groups | grep -Po 'compute-\S+' | while read COMPUTE_GROUP > do ls -ld "/scratch1/fs1/${COMPUTE_ALLOCATION}" > mmlsquota -g "$COMPUTE_GROUP" --block-size auto scratch1-fs1 > done drwxr-sr-x. 6 root compute-ris 4096 Aug 23 02:43 /scratch1/fs1/ris Disk quotas for group compute-ris (gid 1208827): Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks scratch1-fs1 GRP 2.168T 3T 4T 0 none | 33226772 0 0 0 none drwx--S---. 2 root compute-corcoran.william.p 4096 Aug 23 02:24 /scratch1/fs1/corcoran.william.p Disk quotas for group compute-corcoran.william.p (gid 1262586): Block Limits | File Limits Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks scratch1-fs1 GRP 0 50T 50T 0 none | 3 0 0 0 none
Staging Data
The Compute Service home directories and Scratch Space are not accessible from outside of the Compute Platform. Data should be staged to these locations from a Storage Service Allocation, and computational result or job output data should then be staged back to a Storage Service Allocation.
Addtional Notes
Note that your compute lab group and your storage lab group are not the same. That is, the membership of storage-foo
and compute-foo
are likely intentionally different, for specific and meaningful reasons.
1 | Binary terabytes, or “tebibytes”, are base 1024 (that is, there are 1024 gibibytes in every tebibyte, and so on). This comes from the interval between SI suffixes on computers historically being represented by ten binary digits, which is 1024 units in decimal. They are commonly labeled as “TB”, although this can lead to a problematic loss of precision when comparing with values calculated using base 1000. |
2 | The figures representing limits and available space are not necessarily a guarantee that space is available. It is possible for space to be overprovisioned. This happens when the total space available to all users is less than the sum of their quotas. Thus, as every user approaches their quota, there is a potential lower effective limit if the total space for all users is exhausted first. |