Copying Data Between Compute Clusters
Target Audience
This document describes a procedure for transferring data between The McDonnell Genome Institute’s compute0 cluster and WUIT RIS compute1 cluster.
-
You must have login credentials for both compute environments.
-
You must have read/write permissions to the relevant storage volumes.
-
Be mindful of your $USER name on both clusters, some users have differing user IDs.
Build or find a container with ssh and rsync present
This Dockerfile constructs an Ubuntu based container with rsync and openssh.
cat >> Dockerfile <<EOF FROM ubuntu # Tell apt-get that we're not paying attention ENV DEBIAN_FRONTEND noninteractive # First layer: an up to date starting base OS RUN sed -i 's/^# deb /deb /' /etc/apt/sources.list \ && apt-get update # Next: Add desired packages and clean up RUN apt-get install -y --no-install-recommends \ libnss-sss \ openssh-client openssh-server rsync \ && apt-get clean all \ && rm -r /var/lib/apt/lists/* EOF
Push the container to Docker Hub:
docker build . -t $(REGISTRY)$(OWNER)/$(NAME):latest docker push $(REGISTRY)$(OWNER)/$(NAME):latest
Feel free to use my container mcallaway/rsync:latest
Prepare an SSH key on the compute cluster client nodes
We need to create a set of SSH keys. A “user” key for the sending ssh client side, and a “host” key for the ssh server side. You can create both keys in your $HOME directory on compute1, then copy them over to compute0 so they exist on both sides. The sender only needs the user key and the server only needs the host key, but the instructions copy them all just for uniformity.
So, on compute1:
cd $HOME mkdir ./etc
Create a key for use by sshd server:
ssh-keygen -t rsa -f etc/ssh_host_rsa_key -N ''
Create a key for use by ssh client:
ssh-keygen -t rsa -f etc/ssh_user_rsa_key -N ''
Add this user key to your ~/.ssh/authorized_keys file on the compute1 side:
cat etc/ssh_user_rsa_key.pub >> ~/.ssh/authorized_keys
Create an sshd_config that refers to the path to the above SSH keys:
cat > ~/etc/sshd_config <<EOF Port 22 # Override this by passing PORT to sshd_entrypoint.sh HostKey /home/mcallawa/etc/ssh_host_rsa_key PidFile /home/mcallawa/etc/sshd.pid PasswordAuthentication no ChallengeResponseAuthentication no GSSAPICleanupCredentials no EOF
Verify permissions on your ~/.ssh and ~/etc contents:
chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys chmod 700 ~/etc chmod 600 ~/etc/*
Have a wrapper to run sshd as you:
cat >> ~/etc/sshd_entrypoint.sh <<EOF #!/bin/bash PORT=\$1 while true; do echo Starting sshd as \$USER /usr/sbin/sshd -f etc/sshd_config -D -p \$PORT echo sshd exited... sleep 3 done EOF
Now copy the keys etc/ssh_user_rsa_key and etc/ssh_host_rsa_key to the compute0 side by “catting” them and cutting and pasting them into a text editor on compute0.
Launch a sshd job into compute1
Select a port between 8000 and 8999, and start your sshd, making note of the execution node it lands on.
Note here that we expect $HOME to have been passed into the Docker container for the job. This happens automatically if your current working directory happens to be your $HOME, but if you are not launching these jobs from your $HOME, you will have to add $HOME to the list of LSF_DOCKER_VOLUMES.
LSF_DOCKER_VOLUMES="/storage1/fs1/mcallawa:/storage1/fs1/mcallawa" LSF_DOCKER_PORTS='8200:8200' bsub -Is -G compute-ris -q general-interactive -R 'select[port8200=1]' -a 'docker(mcallaway/rsync:latest)' bash ./etc/sshd_entrypoint.sh 8200 Job <60223> is submitted to queue <general-interactive>. <<Waiting for dispatch ...>> <<Starting on compute1-exec-163.ris.wustl.edu>> latest: Pulling from mcallaway/rsync 5c939e3a4d10: Already exists c63719cdbe7a: Already exists 19a861ea6baf: Already exists 651c9d2d6c4f: Already exists bf91b5efbfd8: Pull complete bb3a7dd7dc67: Pull complete Digest: sha256:2366f9b805855764fa7202aaf3f29b5ced4c7af2463fa7570b9ea73a7eb72e58 Status: Downloaded newer image for mcallaway/rsync:latest docker.io/mcallaway/rsync:latest Starting sshd as mcallawa
Interactive jobs will soon have “runtime limits” on the order of 3 days (to be determined). Non-interactive (batch) jobs will have much longer runtime limits, likely 6 weeks. Be wary of “losing” (forgetting about) running jobs, but note they’ll be killed sooner or later. Be mindful of this in large (multi-day) transfers. Use log files to keep track of batch jobs, using bsub -eo job.err -oo job.out … files for stdout and stderr.
Launch an rsync job into compute0
Now launch an rsync job in compute0 to “push” to compute1. Note the use of environment variables here to specify the path from which your data is coming, the host and port involved, and the use of the “host” network for Docker:
export LSF_DOCKER_VOLUMES="/gscmnt/temp403:/gscmnt/temp403" export LSF_DOCKER_NETWORK=host bsub -q research-hpc -Is -a 'docker(mcallaway/rsync:latest)' bash Job <2665534> is submitted to queue <research-hpc>. <<Waiting for dispatch ...>> <<Starting on blade18-2-2.gsc.wustl.edu>> latest: Pulling from mcallaway/rsync 5c939e3a4d10: Already exists c63719cdbe7a: Already exists 19a861ea6baf: Already exists 651c9d2d6c4f: Already exists bf91b5efbfd8: Already exists bb3a7dd7dc67: Pull complete Digest: sha256:2366f9b805855764fa7202aaf3f29b5ced4c7af2463fa7570b9ea73a7eb72e58 Status: Downloaded newer image for mcallaway/rsync:latest mcallawa@blade18-2-2:~$ HOST=compute1-exec-163.ris.wustl.edu # The compute1 exec node above mcallawa@blade18-2-2:~$ PORT=8200 mcallawa@blade18-2-2:~$ rsync --archive --whole-file --verbose --stats --progress -e "ssh -p $PORT -i $HOME/.ssh/ssh_user_rsa_key" /gscmnt/temp403/systems/git_srv.tar.gz $USER@$HOST:/storage1/fs1/mcallawa/Active/data/ sending incremental file list git_srv.tar.gz 32,555,073,536 97% 124.27MB/s 0:00:05
The same warnings apply here with job duration, termination at runtime limits, and the use of output and error files.
You can also simply use scp -P $PORT -i $HOME/.ssh/ssh_user_rsa_key $SRC $USER@$HOST:$DEST
Note that rsync has some computational overhead, is tar over ssh faster?
Instead of using “rsync”, one can also use “tar over ssh”. Note here the use of “pv” to calculate the rate of data crossing a pipe for a “progress bar”:
mcallawa@blade18-2-2:~$ HOST=compute1-exec-163.ris.wustl.edu mcallawa@blade18-2-2:~$ PORT=8200 mcallawa@blade18-2-2:~$ tar cf - /gscmnt/temp403/systems/mcallawa/data/ | pv | ssh -p $PORT -i ~/.ssh/ssh_user_rsa_key $USER@$HOST 'tar xf -' tar: Removing leading `/' from member names 4.27GiB 0:00:29 [ 148MiB/s] [ <=> ]
Caveats
-
We observe a sigle, single-threaded rsync job to transfer at around 150MB/s.
-
Use more than one job across different hosts.
-
Cumulative bandwidth across these two clusters is 2x40Gb.
-
Launch several jobs across different pairs of hosts to parallelize, but remember this is a shared system, be mindful of others. We as a community need to try to “add up” to a cumulative total of about 80G of network consumption. This is hard without QoS tools.
-
Be careful with your file paths, strip “/” where needed.
-
Rsync and tar will preserve symbolic links, where Globus and Samba do not.
-
Many people are likely to use this process, you’ll need to pick a network port not in use. Try this out by using bhosts to find out if a port is open:
# Show me all hosts in the "general" host group with port 8200 open bhosts -w -R 'select[port8200=1]' general