Overview
This documentation will guide you on making sure you’re using the most appropriate OFED version for your Docker image in regards to the Scientific Compute Platform.
Installing the Correct Version
Shown below is an example of OFED 5.4 driver Dockerfile instructions for RedHat 7.7.
ENV MOFED_VERSION 5.4-3.1.0.0 ENV OS_VERSION rhel7.7 ENV PLATFORM x86_64 RUN cd /tmp/ && yum install -y pciutils numactl-libs gtk2 atk cairo gcc-gfortran tcsh lsof libnl3 libmnl ethtool tcl tk perl make libusbx fuse-libs && \ wget -q http://content.mellanox.com/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-${OS_VERSION}-${PLATFORM}.tgz && \ tar -xvf MLNX_OFED_LINUX-${MOFED_VERSION}-${OS_VERSION}-${PLATFORM}.tgz && \ MLNX_OFED_LINUX-${MOFED_VERSION}-${OS_VERSION}-${PLATFORM}/mlnxofedinstall --user-space-only --without-fw-update -q --distro rhel7.7 && \ cd .. && \ rm -rf ${MOFED_DIR} && \ rm -rf *.tgz && \ yum clean all
This also pertains to the Ubuntu with different code snippets but same version of MOFED_VERSION 5.4-3.1.0.0.
ENV MOFED_VERSION 5.4-3.1.0.0 ENV OS_VERSION ubuntu20.04 ENV PLATFORM x86_64 RUN cd /tmp/ && apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends pciutils numactl-libs gtk2 atk cairo gcc-gfortran tcsh lsof libnl3 libmnl ethtool tcl tk perl make libusbx fuse-libs && \ wget -q http://content.mellanox.com/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-${OS_VERSION}-${PLATFORM}.tgz && \ tar -xvf MLNX_OFED_LINUX-${MOFED_VERSION}-${OS_VERSION}-${PLATFORM}.tgz && \ MLNX_OFED_LINUX-${MOFED_VERSION}-${OS_VERSION}-${PLATFORM}/mlnxofedinstall --user-space-only --without-fw-update -q --distro ubuntu20.04 && \ cd .. && \ rm -rf ${MOFED_DIR} && \ rm -rf *.tgz && \ apt-get clean
Once you have the correct OFED version installation code in your Dockerfile, you can build and push the image as you normally would.
Testing Your Image
Shown below are the steps to run a test job.
-
Create a bsub file called test.bsub as shown below. Please replace <Docker image tag> with your Docker image tag and <MPI program>.
#BSUB -q subscription #BSUB -R "span[ptile=1]" #BSUB -a "docker(<Docker image tag>)" #BSUB -G compute-ris #BSUB -oo lsf-%J.log mpirun -np $NP <MPI program>
-
Run your test. Shown below is an example command. Please replace <Number of processes> with number of exec nodes to run the test.
export NP=<Number of processes> && \ LSF_DOCKER_NETWORK=host \ LSF_DOCKER_IPC=host \ LSF_DOCKER_SHM_SIZE=20G \ bsub -n $NP < test.bsub
-
- There is a test script in https://github.com/WashU-IT-RIS/docker-osu-micro-benchmarks.git. Shown below are the instructions for OSU Benchmark test.
-
-
Clone the repository.
git clone https://github.com/WashU-IT-RIS/docker-osu-micro-benchmarks.git
-
Change directory to docker-osu-mirco-benchmarks.
cd docker-osu-mirco-benchmarks
-
- Run an OSU Benchmark test.
-
-
Replace <test> with an OSU test that you want to run. For example, osu_bw for OSU bandwidth test.
-
Replace <compute-group> with the compute group you are a member of.
-
QUEUE=subscription bin/osu-test.sh <test> -G <compute-group>
-
Docker Images Identified That Utilize OFED
If a Docker image you use appears here, you will likely need to update your image.
Current as of 6/10/22
Docker Image |
OFED Version |
---|---|
gcr.io/ris-registry-shared/base-terminal |
4.7-3.2.9.0 |
gcr.io/ris-registry-shared/base-terminal:latest |
4.7-3.2.9.0 |
gcr.io/ris-registry-shared/base-x |
4.7-3.2.9.0 |
gcr.io/ris-registry-shared/base-x-cuda |
4.7-3.2.9.0 |
ruikang/api_wrf |
4.9-2.2.4.0 |
ruikang/api_wrf:latest |
4.9-2.2.4.0 |
us.gcr.io/ris-appeng-shared-dev/bayly-nli:centos7 |
4.9-2.2.4.0 |
us.gcr.io/ris-appeng-shared-dev/compiler-base:oneapi2021.1.1_centos7 |
4.9-2.2.4.0 |