AlphaFold Quickstart: Earlier Versions
This page contains a quick start guide for earlier versions of AlphaFold that are still available but no longer directly supported. Please refer to the latest version for direct support.
AlphaFold 2.0.0
Software Included
-
AlphaFold (https://github.com/deepmind/alphafold)
Getting Started
-
Connect to compute client.
ssh wustlkey@compute1-client-1.ris.wustl.edu
-
Prepare the computing environment before submitting an AlphaFold job.
# Set the AlphaFold base directory export ALPHAFOLD_BASE_DIR=/app/alphafold # Use the scratch file system for temp space export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION} # Use your Active storage for input and output data export STORAGE1=/storage1/fs1/${STORAGE_ALLOCATION}/Active # Mount scratch, Active storage, AlphaFold reference databases and the etc folder export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/references/AlphaFold:/scratch1/fs1/ris/references/AlphaFold $SCRATCH1:$SCRATCH1 $STORAGE1:$STORAGE1 $HOME:$HOME" # Update $PATH with folders containing AlphaFold, CUDA, and conda executables export PATH="/usr/local/cuda/bin/:/opt/conda/bin:/app/alphafold:$PATH" # Use the debug flag when trying to figure out why your job failed to launch on the cluster #export LSF_DOCKER_RUN_LOGLEVEL=DEBUG
-
Submit an AlphaFold job that requests a node with 8 vCPUs, 8 GB of memory, and one GPU.
-
These are the minimum system requirements suggested for running AlphaFold with the
reduced_dbs
setting.
-
bsub -q general -n 8 -M 8GB -R "gpuhost rusage[mem=8GB] span[hosts=1]" -gpu 'num=1' -a "docker(gcr.io/ris-registry-shared/alphafold:2.0.0)" run_alphafold.sh -o /path/to/output/folder -m model_1,model_2,model_3,model_4,model_5,model_2_ptm -f /path/to/input/protein_sequence.fa -t 2021-08-18 -n 8 -p reduced_dbs
-
AlphaFold can run on both the V100 and A100 GPU architectures. If you would like to specify the GPU architecture, please modify the
-gpu
argument in the job submission command.
-gpu 'num=1:gmodel=<gpu_model>'
-
A list of GPU models can be found here.
-
Jobs can be managed using job groups. Job groups are a way to submit a large number of jobs at once.
-
Jobs can be submitted to a condo, if available, by specifying the correct condo queue. Information on this can be found here.
Settings
Please see below for a description of the different settings for AlphaFold.
Settings with a * are required to be set.
-
-o <output_dir>
Path to a directory that will store the results. * -
-m <model_names>
Names of models to use (a comma separated list). * -
-f <fasta_path>
Path to a FASTA file containing one sequence. * -
-t <max_template_date>
Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets. * -
-b <benchmark>
Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default:'False'
). -
-g <use_gpu>
Enable NVIDIA runtime to run with GPUs (default:True
). -
-p <preset>
Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config (full_dbs) or full genetic database config and 8 model ensemblings (casp14). (default:full_dbs
). -
-d <data_dir>
Path to a directory containing the reference databases. Use this option if you want to use your own reference databases.
Preset Models
Please see below for a description of the different preset model configurations available. These presets control the speed and quality of AlphaFold.
-
reduced_dbs
: This preset is optimized for speed and lower hardware requirements. -
full_dbs
: This preset runs with all genetic databases and with no ensembling. -
casp14
: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with ensemblings.
Output
-
The AlphaFold output will be in a subfolder of
output_dir
set with the-o
option. -
Output includes:
-
Computed MSAs
-
Unrelaxed structures
-
Relaxed structures
-
Ranked structures
-
Raw model outputs
-
Prediction metadata
-
Section timings
-
-
The
output_dir
directory will have the following structure:
<target_name>/ features.pkl ranked_{0,1,2,3,4}.pdb ranking_debug.json relaxed_model_{1,2,3,4,5}.pdb result_model_{1,2,3,4,5}.pkl timings.json unrelaxed_model_{1,2,3,4,5}.pdb msas/ bfd_uniclust_hits.a3m mgnify_hits.sto uniref90_hits.sto
-
Please see AlphaFold output documentation for more information on AlphaFold output.