✓ AIP1 Isambard-AI Phase 1 supported
✓ AIP2 Isambard-AI Phase 2 supported
✗ I3 Isambard 3 unsupported
✗ BC5 BlueCrystal 5 unsupported

Interactive Protein Design with Alphafold

Design an insulin binder!

Abstract

This tutorial introduces using JupyterHub notebooks with ColabFold and Colabdesign to fold the structure of insulin, and to design a de-novo protein binder to insulin. This introduces how to work interactively on Isambard-AI, while refining the predicted structure and binder.

Prerequisites

High Performance Computing knowledge is not required. We welcome attendees from all domain backgrounds, that have slight familiarity with python code and an interest in computational biology.

Learning Objectives

Attendees of this tutorial will leave with a better understanding of 3 major points:

Using JupyterHub notebooks with a custom kernel
Interactively use ColabFold and Py3DMol to predict the structure of a protein and visualise it.
Design a novel binder protein using Colabdesign.

Tutorial¶

1. Introduction and Setup¶

Welcome to the Isambard-AI Interactive Alphafold Tutorial.

This tutorial will guide you through folding the insulin protein structure, and designing a novel protein that will bind to insulin. This will be done interactively in a JupyterHub notebook, where you can visualise the structures at every step.

Setting Up the Environment¶

Workshop session

If you are in a scheduled workshop, please wait for instructions from the instructor before setting up the environment. The steps to set up the environment may be different for your session.

Please follow the JupyterHub guide in the documentation page and start your notebook.

If you're going through this tutorial as part of a live workshop, please note from the tutor if you've been asked to set a reservation name.

There are two methods to go through this workshop:

Download and copy the Jupyter notebook colabdesign.ipynb to your home directory on Isambard-AI and execute the cells one-by-one. (Hint: You can run wget in the terminal followed by the previous link to quickly download it to your home directory on the cluster.)
You can start by creating an empty notebook, and copying and pasting the content from this page into cells.

The tutorial is pre-downloaded on Isambard-AI

You can find the tutorial in this directory:

/projects/public/brics/tutorials/colabdesign-tutorial.ipynb

Choosing a kernel¶

This tutorial depends on packages for the model and profiling tools. We have configured a kernel ready for you to use. First choose the correct ipykernel (colabdesign) from the top right menu.

Creating your own kernel

The kernel for this tutorial is based on a conda environment. You can find the environment definition file here: colabdesign-env.yaml

The Jupyter documentation details how to create a custom kernel spec based on your environment here.

2. Folding Insulin with ColabFold¶

First, let's predict the 3D structure of insulin using ColabFold. The predicted structure will serve as the target for our binder design in later sections.

from colabfold.batch import get_queries, run

import os
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# Set up output directory
OUTPUT_DIR = Path(f"insulin_structure1")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
# Directory of alphafold parameters
DATA_DIR = "/projects/public/brics/tutorials/colabfold/"

# Define insulin sequence (A and B chains)
insulin_sequence = (
   "GIVEQCCTSICSLYQLENYCN",            # This is A chain (21 aa)
   "FVNQHLCGSHLVEALYLVCGERGFFYTPKA"  # This is B chain (30 aa)
)

# Fold insulin using ColabFold
queries = [("insulin", insulin_sequence, None, None)]
run(
    queries=queries, # Input query
    result_dir=OUTPUT_DIR, # Structure output directory
    num_models=1, # Can be increased to improve accuracy
    use_templates=False, # Don't use experimental PDB structures as templates
    num_recycles=1, # AlphaFold iteratively refines its prediction by feeding outputs back as inputs
    is_complex=True, # Whether the protein is made of a complex of chains
    data_dir=DATA_DIR, # Directory of alphafold parameters
)
print("Done! available output directories:")
! ls

We set the output directory OUTPUT_DIR to capture the AlphaFold prediction outputs, and set the directory of the alphafold parameters DATA_DIR. We next define insulin's two amino acid chains in insulin_sequence. We define our query, and the colabfold.batch.run() function then performs the structure prediction. Since insulin is a multi-chain protein, is_complex=True is required.

Two parameters control the accuracy–speed trade-off: num_models sets how many of ColabFold's neural network models are run and their predictions averaged — more models improves reliability but increases runtime. num_recycles controls how many times AlphaFold feeds its own output back as input to iteratively refine the predicted structure — more recycles generally improves accuracy, especially for larger or more complex proteins. Both are set to 1 here to keep the tutorial runtime short; you can increase them to improve prediction quality once you are familiar with the workflow.

In the next section we show profiling tools to monitor how your GPU behaves during the prediction.

Hardware Monitoring with nvdashboard¶

Colabfold does not output what stage the prediction is at, or how the hardware is performing. In general, when training or using machine learning models, it is important to check that you are getting the most out of your hardware. Let's check the GPUs on our machine.

We can use the NVDashboard JupyterLab extension that is pre-installed in the JupyterLab session. To start it click on the third tab on the left hand side with a GPU symbol:

Jupyter nvdashboard

We recommend you choose "GPU Memory" and "GPU Utilization".

How does colabfold use the GPU?

It is always good practice to see how much of the GPU's available memory and performance the model is currently using.

3. Visualising the folded protein against the true structure using Py3DMol¶

Now let's compare our predicted insulin structure against the experimentally determined reference. We load the best-ranked predicted PDB file from the previous step, fetch the known structure from the RCSB Protein Data Bank (PDB ID: 4INS), and display them side-by-side using Py3DMol.

import py3Dmol
import requests

# Load structures
OUTPUT_DIR = Path(f"insulin_structure1")
pdb_file = sorted(OUTPUT_DIR.glob("insulin_*_rank_001*.pdb"))[0]
with open(pdb_file, 'r') as f:
    predicted_pdb = f.read()

PDB_ID = "4INS"
true_pdb = requests.get(f'https://files.rcsb.org/download/{PDB_ID}.pdb1').text

# Create side-by-side viewer
view = py3Dmol.view(width=1000, height=500, viewergrid=(1,2))

# Left: Experimental structure
view.addModel(true_pdb, 'pdb', viewer=(0,0))
view.setStyle({'cartoon': {'color': 'spectrum'}}, viewer=(0,0))
view.zoomTo(viewer=(0,0))

# Right: Predicted structure (colored by pLDDT confidence)
# pLDDT is stored in B-factor: blue = high confidence, red = low
view.addModel(predicted_pdb, 'pdb', viewer=(0,1))
view.setStyle({'cartoon': {'colorscheme': {'prop': 'b', 'gradient': 'rwb', 'min': 60, 'max': 85}}}, viewer=(0,1))
view.zoomTo(viewer=(0,1))

view.show()

The left panel shows the experimental structure (4INS), coloured by chain using a spectrum gradient. The right panel shows our ColabFold prediction, coloured by the pLDDT confidence score ranging from low (red) to high (blue). AlphaFold outputs two key confidence metrics:

pLDDT (predicted Local Distance Difference Test) is a per-residue score (0–100) reflecting AlphaFold's confidence that distances between neighbouring residues are correct. High values (blue, >85) indicate well-structured regions; low values (red, <60) often indicate disordered or flexible loops.
PAE (Predicted Aligned Error) is a pairwise matrix measuring the expected positional error (in Ångströms) between the predicted and true structure for each pair of residues. Lower values indicate higher confidence. PAE is useful for assessing domain-domain or protein-protein interface confidence.

Note that both metrics are the model's own confidence estimates, not a comparison against experimentally determined structures.

Compare the True and Predicted PDBs

Does the predicted PDB match the true experimentally derived structure?

It is important to create an accurate structure before continuing to the next step. Insulin should have 3 alpha helices in total across the A and B chains.

Aim for the alpha helices to be blue (pLDDT score of 0.8) before moving on to the next section.

Visualise your folded protein.

4. Creating an insulin binding protein using ColabDesign¶

In this section we use ColabDesign to design a novel protein that will bind to insulin. ColabDesign is a framework that wraps several state-of-the-art protein design methods, including ProteinMPNN (which designs sequences for a fixed backbone) and RFdiffusion (which generates new backbone structures). Here we use the AlphaFold design (AFDesign) method, which optimises both the sequence and structure simultaneously by differentiating through AlphaFold's own predictions — meaning the binder is designed to score well under AlphaFold's model of protein interactions.

As we are designing a custom binder, we can set an arbitrary length of amino acids for our protein. You are free to set binder_len to a value of your choosing — we recommend between 10 and 20 residues.

from colabdesign import mk_afdesign_model, clear_mem
from colabdesign.shared.utils import copy_dict

# Clear GPU memory from previous AlphaFold run
clear_mem()

# Find the best-ranked predicted structure from Cell 1
OUTPUT_DIR = Path(f"insulin_structure1")
pdb_file = sorted(OUTPUT_DIR.glob("insulin_*_rank_001*.pdb"))[0]
print(f"Using structure: {pdb_file.name}")

# Initialize the binder design model
af_model = mk_afdesign_model(
    protocol="binder",      # Design a new protein that binds to a target
    use_multimer=True,      # Use AlphaFold-Multimer (required for protein-protein interactions)
    num_recycles=3,         # Number of refinement iterations during design
    data_dir=DATA_DIR     # Path to AlphaFold model weights
)

# Define the design task
af_model.prep_inputs(
    pdb_filename=str(pdb_file),  # Target structure to design a binder for
    chain="A,B",                  # Which chains are the target (insulin A and B chains)
    binder_len=x,                # ! Choose a value between 10 and 20. Length of the binder sequence to design (in residues)
)

# Configure optimization settings for scoring
af_model.set_opt(
    num_recycles=3,   # Recycles during scoring (can differ from design recycles)
    num_models=3,     # Number of AF2 models to ensemble (1 = faster, 5 = more robust)
)

# Run 3-stage design optimization
# Stage 1 (soft): Continuous relaxation of sequence space - fast exploration
# Stage 2 (temp): Simulated annealing - gradually discretize to real amino acids  
# Stage 3 (hard): Discrete optimization - fine-tune the final sequence
af_model.design_3stage(
    soft_iters=100,   # Iterations for soft
    temp_iters=100,   # Iterations for temp
    hard_iters=10,    # Iterations for hard
)

# Save the designed complex (binder + insulin) as PDB
af_model.save_pdb("designed_binder1.pdb")

# Extract and display the designed binder sequence
designed_sequence = af_model.get_seqs()[0]
print(f"Designed binder sequence: {designed_sequence}")
print(f"Sequence length: {len(designed_sequence)} residues")
print(f"Binding metric ipTM: {af_model.aux['i_ptm']:.3f}")

The design runs in three stages via design_3stage().

soft stage treats the sequence as a continuous distribution over all amino acids, allowing fast exploration of sequence space.
temp stage uses simulated annealing to gradually discretise this into real amino acid choices.
hard stage then fine-tunes the final discrete sequence directly.

To improve binding quality, you can increase soft_iters, temp_iters, or hard_iters for more thorough optimisation, or increase num_models in set_opt() to ensemble more AlphaFold models during scoring — both at the cost of longer runtime.

Final section: Visualise and evaluate your de-novo binder¶

Now let's visualise the designed complex and assess how well our binder is predicted to interact with insulin. The code colours insulin (chains A and B) in blue and the designed binder in red, so you can clearly see how the two proteins sit relative to each other.

view = py3Dmol.view(width=600, height=400)
view.addModel(open("designed_binder1.pdb", "r").read(), "pdb")

# Color each chain differently
view.setStyle({'chain': 'A'}, {'cartoon': {'color': 'blue'}})   # Insulin
view.setStyle({'chain': 'B'}, {'cartoon': {'color': 'red'}})   # Binder  
view.zoomTo()
view.show()

Alongside the structure, the code also prints the ipTM (Interface predicted TM-score), which is the key metric for evaluating how confidently AlphaFold predicts the two proteins are actually interacting. Unlike pLDDT (which scores individual residues), ipTM specifically measures the quality of the predicted interface between chains. Use the table below to interpret your result:

ipTM	Interpretation
> 0.8	High confidence interaction
0.6 - 0.8	Moderate confidence
< 0.5	Low confidence / likely not binding well

Where does your protein bind to insulin?

Look at where the red binder chain contacts the blue insulin structure. Does it bind at a specific region of insulin, or does it make broad contact? Compare your results accross different runs — do binders of different lengths or from different design runs tend to bind at the same site?

(Bonus round) Visualise a real drug (Keytruda)¶

To put our work in context, let's visualise a real approved cancer drug. Keytruda (pembrolizumab) is an antibody immunotherapy that works by blocking the PD-1 checkpoint receptor, allowing the immune system to attack tumour cells. It is approved to treat over 20 cancer types, including melanoma, lung cancer, colorectal cancer, and breast cancer, making it one of the most widely used cancer immunotherapies in the world.

Here we load the crystal structure of Keytruda bound to PD-1 (PDB: 5B8C) directly from the RCSB, exactly as we did for insulin in section 3.

PDB_ID = "5B8C" # Keytruda bound to PD-1
keytruda_pdb = requests.get(f'https://files.rcsb.org/download/{PDB_ID}.pdb1').text
view = py3Dmol.view(width=800, height=600)
view.addModel(keytruda_pdb, 'pdb')
# PD-1 (target) in blue
view.setStyle({'chain': 'A'}, {'cartoon': {'color': 'blue'}})
# Keytruda (antibody) in red
view.setStyle({'chain': 'B'}, {'cartoon': {'color': 'red'}})
view.setStyle({'chain': 'C'}, {'cartoon': {'color': 'red'}})
view.zoomTo()
view.show()

The blue chain is PD-1 and the red chains form the Keytruda antibody. Notice how tightly the antibody wraps around a specific region of PD-1 — this precise interface is what makes Keytruda an effective drug. The workflow you used in this tutorial to design a binder to insulin is conceptually the same process used in early-stage computational drug discovery.

Closing the notebook¶

Workshop session

If you are in a scheduled workshop, please follow instructions from the instructor for closing your Jupyter notebook session. The steps for closing the notebook session may be different for your session.

When you are finished working in JupyterLab, shut down the server using the File > Shut Down option from the JupyterLab menu.

Conclusion¶

In this tutorial you have used Isambard-AI's GPU hardware interactively to carry out a real computational protein design workflow:

Folded a protein from sequence — using ColabFold to predict the 3D structure of insulin from its amino acid sequence, and monitoring GPU utilisation during inference with NVDashboard.
Evaluated a structure prediction — using Py3DMol to compare the predicted insulin structure against the experimentally determined reference, and interpreting per-residue confidence via the pLDDT score.
Designed a novel binder protein — using ColabDesign's AFDesign method to optimise a completely new protein sequence and structure to bind insulin, running a three-stage gradient-based design pipeline.
Assessed binding quality — interpreting the ipTM score to evaluate the predicted interaction between your binder and insulin.

The same tools and approaches used here underpin cutting-edge research in structural biology and computational drug discovery.

Resources¶

Exxact Corp blog post on colabdesign.

Acknowledgements¶

Created: 09/02/2026. Authors: Wahab Kawafi, Jon Lees.