Skip to end of banner
Go to start of banner

RIS-Vetted Application Containers

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 126 Next »

This page contains basic information and user-friendly guides to running RIS-vetted application containers on compute. This is not an inclusive list of usable applications, by any means, and is intended solely as a starting point for new users.

This page does not include application containers where the user does not want, or is forbidden, to put code into a public registry.

The application containers listed on this page have been vetted to start a shell in an interactive job at the time of testing and were confirmed working at the time they were tested. The level of security with non-RIS-hosted images is not guaranteed and has not been tested.

As such, we recommend to use RIS-hosted Docker images when possible. Please visit this page for a list of RIS-hosted Docker images.

To successfully run bsub commands on this page, you must already be logged into the Compute Service:

> ssh ${compute_username}@compute1-client-1.ris.wustl.edu

If you are not connected to a a Medical School network (WUSM-secure, MSVPN), or the Danforth VPN, you will need to perform port forwarding before accessing a GUI (i.e. accessing a Jupyter notebook or RStudio session). Documentation to forward ports is found here.

Applications, Packages and Environment Images

.NET Core

  • Registry Location: https://hub.docker.com/_/microsoft-dotnet-core-sdk/

  • “.NET Core is an open-source, general-purpose development platform maintained by Microsoft and the .NET community on GitHub. It’s cross-platform (supporting Windows, macOS, and Linux) and can be used to build device, cloud, and IoT applications.” - Source: https://docs.microsoft.com/en-us/dotnet/core/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(mcr.microsoft.com/dotnet/core/sdk)' /bin/bash

Anaconda

  • Registry Location:
  • “Anaconda is the leading open data science platform powered by Python. The open source version of Anaconda is a high performance distribution and includes over 100 of the most popular Python packages for data science. Additionally, it provides access to over 720 Python and R packages that can easily be installed using the conda dependency and environment manager, which is included in Anaconda.” - Source: https://hub.docker.com/r/continuumio/anaconda3

  • Run interactive job:
    # Using Python 3.5:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(continuumio/anaconda3)' /bin/bash
    
    # Using Python 2.7:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(continuumio/anaconda)' /bin/bash

AnnovarR

  • Registry Location: https://registry.hub.docker.com/r/bioinstaller/annovarr

  • “The annovarR package provides R functions as well as database resources which offer an integrated framework to annotate genetic variants from genome and transcriptome data. The wrapper functions of annovarR unified the interface of many published annotation tools, such as VEP, ANNOVAR, vcfanno and AnnotationDbi.” - Source: https://registry.hub.docker.com/r/bioinstaller/annovarr

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(bioinstaller/annovarr)' R

BamTools

BCFtools

  • Registry Location: https://bioconda.github.io/recipes/bcftools/README.html

  • “BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.” - Source: https://bioconda.github.io/recipes/bcftools/README.html

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bcftools:1.10.2--hd2cd319_0)' /bin/bash

bedtools

BLAST

Bowtie

  • Registry Location: https://bioconda.github.io/recipes/bowtie/README.html

  • “Bowtie is anultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).” - Source: http://bowtie-bio.sourceforge.net/index.shtml

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bowtie:1.2.3--py37hc9558a2_0)' /bin/bash

Bowtie2

  • Registry Location: https://quay.io/repository/biocontainers/bowtie2?tab=info

  • “Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.” - Source: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

  • Run interactive job:
    > LSF_DOCKER_PRESERVE_ENVIRONMENT=false bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bowtie2:2.4.1--py38he513fc3_0)' /bin/bash

BreakDancer

  • Registry Location: https://bioconda.github.io/recipes/breakdancer/README.html

  • “BreakDancer-1.3.6, is a Cpp package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads. It includes two complementary programs. BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation. BreakDancerMini focuses on detecting small indels (usually between 10bp and 100bp) using normally mapped read pairs.” - Source: https://github.com/genome/breakdancer

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/breakdancer:1.4.5--2)' /bin/bash

BWA

  • Registry Location: https://bioconda.github.io/recipes/bwa/README.html

  • “BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.” - Source: https://github.com/lh3/bwa

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bwa:0.7.8--hed695b0_5)' /bin/bash

Cell Ranger

Circos

  • Registry Location: https://bioconda.github.io/recipes/circos/README.html

  • “Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.” - Source: http://circos.ca/

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/circos:0.69.8--0)' /bin/bash

cn.mops

CNVkit

  • Registry Location: https://bioconda.github.io/recipes/cufflinks/README.html

  • “Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.” - Source: http://cole-trapnell-lab.github.io/cufflinks/

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/cufflinks:2.2.1--py27_2)' /bin/bash

edgeR

  • Registry Location: https://bioconda.github.io/recipes/bioconductor-edger/README.html

  • “Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE.” - Source: https://bioconductor.org/packages/3.10/bioc/html/edgeR.html

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bioconductor-edger:3.28.0--r36he1b5a44_0)' R

Ensembl VEP

freebayes

  • Registry Location: https://bioconda.github.io/recipes/freebayes/README.html

  • “freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.” - Source: https://github.com/ekg/freebayes

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/freebayes:1.3.1--py37h56106d0_0)' /bin/bash

GenomicRanges

  • Registry Location: https://bioconda.github.io/recipes/bioconductor-genomicranges/README.html

  • “The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.” - Source: https://bioconductor.org/packages/3.10/bioc/html/GenomicRanges.html

  • Run Interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bioconductor-genomicranges:1.38.0--r36h516909a_0)' R

GenVisR

Go

GROMACS

  • Registry Location: https://hub.docker.com/r/gromacs/gromacs

  • “GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.” - Source: http://www.gromacs.org/About_Gromacs

  • Run interactive job:
    > export PATH=$PATH:/gromacs/bin
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(gromacs/gromacs)' /bin/bash

HISAT2

  • Registry Location: https://bioconda.github.io/recipes/hisat2/README.html

  • “HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs [Sirén et al. 2014], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).” - Source: https://ccb.jhu.edu/software/hisat2/index.shtml

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/hisat2:2.1.0--py37hc9558a2_4)' /bin/bash

HOOMD

  • Registry Location: https://hub.docker.com/r/glotzerlab/software/

  • “HOOMD-blue is a general-purpose particle simulation toolkit. It scales from a single CPU core to thousands of GPUs. You define particle initial conditions and interactions in a high-level python script. Then tell HOOMD-blue how you want to execute the job and it takes care of the rest. Python job scripts give you unlimited flexibility to create custom initialization routines, control simulation parameters, and perform in situ analysis.” - Source: http://glotzerlab.engin.umich.edu/home/resources/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(glotzerlab/software)' /bin/bash

IRanges

  • Registry Location: https://bioconda.github.io/recipes/bioconductor-iranges/README.html

  • “Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.” - Source: https://bioconductor.org/packages/3.10/bioc/html/IRanges.html

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bioconductor-iranges:2.20.0--r36h516909a_0)' R

Julia

  • Registry Location: https://hub.docker.com/_/julia

  • “Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.” - Source: https://julialang.org/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(julia)' /bin/bash

Jupyter

  • Registry Location: https://hub.docker.com/r/jupyter/datascience-notebook

  • “Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data.” - Source: https://jupyter.org

  • “jupyter/datascience-notebook includes libraries for data analysis from the Julia, Python, and R communities.” - Source: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-datascience-notebook

  • Run interactive job:

    • Choose a port number from 8000 to 8999 to be able to connect your browser to. In the example below, this value is set to 8888.

      • Please see our documentation for more information on selecting a port.

    • Submit the interactive job:

      # Running Jupyter Notebook
      > LSF_DOCKER_PORTS='8888:8888' PATH="/opt/conda/bin:$PATH" bsub -G ${group_name} -Is -q general-interactive -R 'select[port8888=1]' -a 'docker(jupyter/datascience-notebook:ubuntu-20.04)' jupyter-notebook --ip=0.0.0.0 --NotebookApp.allow_origin=*
      
      # Running JupyterLab
      > JUPYTER_ENABLE_LAB=True LSF_DOCKER_PORTS='8888:8888' PATH="/opt/conda/bin:$PATH" bsub -G ${group_name} -Is -q general-interactive -R 'select[port8888=1]' -a 'docker(jupyter/datascience-notebook:ubuntu-20.04)' jupyter-lab --ip=0.0.0.0 --NotebookApp.allow_origin=*
    • Connect to your server by pointing your web browser to the node, port and token from the output of the interactive job.

      http://compute1-exec-nn.ris.wustl.edu:8888/?token=<48-character token>

    • When you are finished with your server, use Ctrl+C to stop this server (twice to skip confirmation) and “exit” the interactive shell.

  • If you encounter an error stating your shell has not been properly configured, please see this section on initializing your shell to use conda.

  • If you encounter an error during startup, specifically with applications such as fluxbox, x11vnc, or xterm, in addition to seeing something like “(exited status 1; not expected)” in your terminal, you will need to ssh into the same host again. Assuming you are currently on compute1-client-1.ris.wustl.edu, please ssh again to compute1-client-1.ris.wustl.edu and try to run your job again. Meaning, SSH from your local machine to compute1 and then from compute1 again to compute1. For example: localhost –> compute1-client-1 –> compute1-client-1 These issues appear to exist due to incompatible local terminal configuration, MobaXTerm for Windows and Terminal for Mac, being carried forward and applied on the compute1 system.

GPU-enabled Jupyter

GPU-enabled Jupyter notebooks are available via the RAPIDS Docker image.

maftools

Manta

monocle

  • Registry Location: https://bioconda.github.io/recipes/bioconductor-monocle/README.html

  • “Monocle performs differential expression and time-series analysis for single-cell expression experiments. It orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. Monocle also performs differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well.” - Source: https://bioconductor.org/packages/release/bioc/html/monocle.html

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/bioconductor-monocle:2.14.0--r36he1b5a44_1)' R

NovoAlign

OpenJDK (Java)

  • Registry Location: https://hub.docker.com/_/openjdk

  • “OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE).” - Source: https://en.wikipedia.org/wiki/OpenJDK

  • Run interactive job:
    # Using Java 11:
    > LSF_DOCKER_PRESERVE_ENVIRONMENT=false bsub -G ${group_name} -Is -q general-interactive -a 'docker(openjdk:11-slim)' /bin/bash
    
    # Using Java 8:
    > LSF_DOCKER_PRESERVE_ENVIRONMENT=false bsub -G ${group_name} -Is -q general-interactive -a 'docker(openjdk:8-slim)' /bin/bash

Oncotator

Organism.dplyr

Perl

  • Registry Location: https://hub.docker.com/_/perl

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(perl:slim)' /bin/bash

Picard

Pindel

  • Registry Location: https://bioconda.github.io/recipes/pindel/README.html

  • “Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.” - Source: http://gmt.genome.wustl.edu/packages/pindel/index.html

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/pindel:0.2.5b9--he527e40_3)' /bin/bash
  • Registry Location: https://bioconda.github.io/recipes/plink/README.html

  • “PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.” - Source: http://zzz.bwh.harvard.edu/plink/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/plink:1.90b6.12--heea4ae3_0)' /bin/bash

PLINK2

  • Registry Location: https://hub.docker.com/r/skwalker/plink2

  • “The new release…is a complete rewrite of the original code and represents a very significant improvement in overall speed and functionality. Moving forward, these changes should enable PLINK to meet the demands of ever-larger genetic datasets.” - Source: https://zzz.bwh.harvard.edu/plink/plink2.shtml

  • Run interactive job:
    > PATH=$PATH:/usr bsub -G ${group_name} -Is -q general-interactive -a 'docker(skwalker/plink2)' /bin/bash

PRSice-2

  • Registry Location: https://hub.docker.com/r/lifebitai/prsice2

  • “PRSice (pronounced ‘precise’) is a Polygenic Risk Score software for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS) analyses.” - Source: https://www.prsice.info/

  • Run interactive job:
    > PATH="/opt/conda/bin:$PATH" bsub -G ${group_name} -Is -q general-interactive -a "docker(lifebitai/prsice2)" /bin/bash
  • Activate conda environment:
    > conda activate prs
    • You will now be able to use PRSice.R and PRSice_linux commands.

  • If you encounter an error stating your shell has not been properly configured, please see this section on initializing your shell to use conda.

Python

  • Registry Location: https://hub.docker.com/_/python

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(python:slim)' /bin/bash

QIIME2

  • Registry Location: https://hub.docker.com/r/qiime2/core

  • “QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.” - Source: https://qiime2.org/

  • Run interactive job:
    > LSF_DOCKER_PRESERVE_ENVIRONMENT=false bsub -G ${group_name} -Is -q general-interactive -a 'docker(qiime2/core:2020.2)' /bin/bash

R

  • Registry Location: https://hub.docker.com/_/r-base

  • “R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.” - Source: https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_003f

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(r-base:<tag>)' R

    Where <tag> is replaced by the version tag you wish to use. These can be found on the r-base Docker Hub page.

R Version Issues

  • There are known issues with the latest image of Debian which affects the r-base images and causes some commands not to cooperate in regards to the Storage Platform.

  • The latest version (4.1.2) also will not run R as expected and users should use an earlier version than the latest.

  • These are known and the RIS Team is working on fixing these issues.

RAPIDS GPU-enabled Jupyter

  • Registry Location: https://hub.docker.com/r/rapidsai/rapidsai

  • “The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.” - Source: https://rapids.ai/about.html

  • Run interactive job:

    • Choose a port number from 8000 to 8999 to be able to connect your browser to. In the example, this value is set to 8888.

      • Please see our documentation for more information on selecting a port.

    • Submit the interactive job:

    > LSF_DOCKER_PORTS="8888:8888" PATH="/opt/conda/bin:$PATH"  bsub -Is -q general-interactive -R 'select[gpuhost,port8888=1]' -gpu "num=1:gmodel=TeslaV100_SXM2_32GB" -a 'docker(rapidsai/rapidsai:0.18-cuda11.0-runtime-ubuntu20.04-py3.8)' /bin/bash
    • Connect to your server by pointing your web browser to the node, port and token from the output of the interactive job.

      http://compute1-exec-nn.ris.wustl.edu:8888

    • When you are finished with your server, use Ctrl+C to stop this server (twice to skip confirmation) and “exit” the interactive shell.

  • If you encounter an error stating your shell has not been properly configured, please see this section on initializing your shell to use conda.

Setting Up A Jupyter Notebook Password

It is recommended to password-protect your Jupyter sessions. This can be done after submitting a RAPIDS interactive job. After your interactive job has landed, please follow the instructions below.

  1. Start an iPython session.

> ipython
  1. Enter in the following commands at the iPython prompt, each in their own line.

In [1]: from notebook.auth import passwd
In [2]: passwd()
  1. You will be prompted to enter and verify a password. The result will be a string of characters.

Out[2]: 'argon2:$argon2id$v=19$m=10241,t=10,p=8$+ASmr7ZxR7Glc8ZVu+fn6g$Ehusp4AdgQjv/zq5AWFy0g'
  1. Copy the string of characters and exit out of iPython and RAPIDS interactive job.

In [3]: quit
> exit
  1. Open the Jupyter configuration file with Vim.

For help on using Vim, please see this resource.

> vim ~/.jupyter/jupyter_notebook_config.py
  1. Add the copied string of characters to the c.NotebookApp.password variable. The file should look like the example below (replacing the string after u with the string you copied):

c.NotebookApp.password = u'argon2:$argon2id$v=19$m=10241,t=10,p=8$+ASmr7ZxR7Glc8ZVu+fn6g$Ehusp4AdgQjv/zq5AWFy0g'
  1. Save the file and exit out of Vim.

  2. From now on, when you submit a job that uses Jupyter notebook, you will be prompted to enter your password.

  3. If you ever forget the password, delete the configuration file and start over.

> rm ~/.jupyter/jupyter_notebook_config.py

RStudio

Version 4.0.2

  • Note this is vetted only for image version up to 4.0.2. Use koetjen/rstudio:4.0.3 for R version 4.0.3 and RStudio version 1.3.1093.

  • An initial setup of dependent files and directory structure is required. Please see this section for details.

  • Choose a port number from 8001 to 8999 to be able to connect your browser to.

    • Please see our documentation for more information on selecting a port.

  • In the example, this value is set to 8081

  • Note: you are telling compute1 to reserve this port exclusively for you on the node.

  • Run interactive job:
    > LSF_DOCKER_VOLUMES='/home/${compute_username}:/home/${compute_username}' PATH=/home/${compute_username}:$PATH LSF_DOCKER_PORTS='8081:8787' bsub -Is -G ${group_name} -q general-interactive -R 'select[port8081=1]' -a 'docker(rocker/verse:4.0.2)' /bin/bash
    > rstudio-server start

Note

The node your job lands on, typically looks like <<Starting on compute1-exec-nn.ris.wustl.edu>>, where nn is the node number.

  • Connect to your server using the node and port from the interactive job, replacing nn with the node your job landed on and ${port} with the port chosen earlier. If you followed the example interactive job, the port is 8081:

  • When you are finished with your server, Ctrl+C to stop it and “exit” the interactive shell.

Version 4.1.0

Requirements

Please follow the initial setup steps for RStudio Version 4.0.2 before continuing

  • Create a directory for the RStudio database files
    > mkdir -p $HOME/rstudio_db
  • Run an interactive job using port 8081 (Please see our documentation for more information on selecting a port for RStudio.

    > LSF_DOCKER_VOLUMES="$HOME:$HOME $HOME/rstudio_db/:/var/lib/rstudio-server" PATH="$HOME:$PATH" LSF_DOCKER_PORTS='8081:8787' bsub -Is -G ${group_name} -q general-interactive -R 'select[port8081=1]' -a 'docker(rocker/verse:4.1.0)' /bin/bash
    > rstudio-server start

Note

The node your job lands on, typically looks like <<Starting on compute1-exec-nn.ris.wustl.edu>>, where nn is the node number.

  • Connect to your RStudio 4.1.0 job using the same directions for version 4.0.2.

samtools

  • Registry Location: https://bioconda.github.io/recipes/samtools/README.html

  • “Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories: Samtools - Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format, BCFtools - Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants, HTSlib - A C library for reading/writing high-throughput sequencing data” - Source: https://www.htslib.org/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/samtools:1.2-0)' /bin/bash

seurat-scripts

Shiny

SingleR

SnpSift

STAR

SRA Toolkit

  • Registry Location:

  • “The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format to the following formats: ABI SOLiD native (colorspace fasta / qual), fasta, fastq, sff, sam (human-readable bam, aligned or unaligned), Illumina native…” - Source: https://www.ncbi.nlm.nih.gov/books/NBK158900/

  • Run interactive job:
    > PATH="/usr/local/ncbi/sra-tools/bin:$PATH" bsub -G ${group_name} -Is -q general-interactive -a "docker(inutano/sra-toolkit:latest)" /bin/sh

Strelka2

TopHat

  • Registry Location: https://bioconda.github.io/recipes/tophat/README.html

  • “TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.” - Source: http://ccb.jhu.edu/software/tophat/index.shtml

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/tophat:2.1.1--py27_3)' /bin/bash

Valgrind

  • Registry Location: https://hub.docker.com/r/greymail/gcc-cmake-valgrind

  • “Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.” - Source: https://valgrind.org/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(greymail/gcc-cmake-valgrind:0.1)' /bin/bash

Varscan

  • Registry Location: https://bioconda.github.io/recipes/varscan/README.html

  • “VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. It can be used to detect different types of variation.” - Source: http://dkoboldt.github.io/varscan/

  • Run interactive job:
    > bsub -G ${group_name} -Is -q general-interactive -a 'docker(quay.io/biocontainers/varscan:2.3.7--3)' /bin/bash

VCFtools

Velocyto

Anaconda Environment Errors

If you encounter the following error when invoking the conda command:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.

You will need to initialize your shell for use with Anaconda. This only has to be done once. Copy and paste the code block below into your terminal.

  • In rare cases, $HOME is not /home/${compute_username}. See the quick-start for details.

  • If this is the case echo $HOME and use this instead at the appropriate places below.

cat <<EOF >> $HOME/.bashrc
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
        . "/opt/conda/etc/profile.d/conda.sh"
    else
        export PATH="/opt/conda/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<
EOF

Re-initialize your bashrc

conda init bash
source $HOME/.bashrc

You should now be able to activate an Anaconda environment.

Port Selection

Some applications require access to a port inside the running Docker container to run an application. Two examples would be Jupyter and RStudio, which use ports 8888 and 8787 respectively. When submitting a job, a port is exposed and forwarded to a port inside the Docker container using the LSF_DOCKER_PORTS environment variable. Please see our documentation for more information on using LSF_DOCKER_PORTS.

The Jupyter example job submissions expose port 8888 to access the running Jupyter Docker container while the RStudio example job submissions expose port 8000. If you choose to use a different port, you will also need to update the job submission command. This is true for all jobs requiring access to a port.

For example, if an application is running on port 8999, and you would like to expose port 8001 to access the application running in the Docker container on port 8999, you would need to update the job submission command as follows:

Modify LSF_DOCKER_PORTS:

export LSF_DOCKER_PORTS="8001:8999"

Modify the LSF port resource request:

-R "select[port8001=1]"

Notice that only the first port in LSF_DOCKER_PORTS is changed since port 8999 is the port defined by the Docker image maintainer to run the application with. Any port between 8000-8999 can be exposed.

If the application requires access via a web browser, the port in the URL will also need to be changed. As an example, to access a Jupyter session using port 8001, the URL would need to be changed to:

http://compute1-exec-nn.ris.wustl.edu:8001/?token=<48-character token>

  • No labels