Using ChimeraX to analyse AlphaFold predictions

Thibault Tubiana, Chloé Quignot

Last updated: 2025-12-10

Context & setup

In this practical session, you will learn how to use ChimeraX in order to visualise AlphaFold outputs interactively.

Why ChimeraX?

ChimeraX is developed by the Resource for Biocomputing, Visualization, and Informatics (RBVI) at UCSF. It is released under a non-commercial license for academic, government, nonprofit, and personal use, and it’s source code is available on GitHub. It is particularly suited for AlphaFold output analysis thanks to its many functionalities adapted for interactive score visualisation.

Installing ChimeraX

ChimeraX is easily downloadable from their website for any OS (Windows, Linux, Mac): https://www.cgl.ucsf.edu/chimerax/download.html

Additionnal information

ChimeraX help pages are very complete: https://www.rbvi.ucsf.edu/chimerax/docs/user/index.html

ChimeraX interface

The ChimeraX window is composed of the Menu Bar & Tool Bar across the top, the command line area at the bottom and several detachable and moveable pannels in between, of which:

Log Pannel: Any action is recorded here and clickable to open its respective help page
Models Pannel: All objects (e.g. molecules) are listed here and are given an ID
Working Pannel: Here’s where you’ll visualise your objects

Input data

We will be using the pre-computed data from the previous practical “Application de MassiveFold à une protéine monomère inconnue”.

Download the pre-computed outputs (zip folder called A0A2U7UDN4.zip) from NextCloud
Unzip the folder
You should have the following folder architecture:

A0A2U7UDN4/
├── af3_default/ # AlphaFold3 outputs
│   └──...
├── afm_default/ # AFMassif outputs
│   └──...
├── cf_default/ # ColabFold outputs
│   └──...
├── msas/ # MSAs used in predictions
│   ├── bfd_uniref_hits.a3m
│   ├── mgnify_hits.sto
│   ├── pdb_hits.hhr
│   └── uniref90_hits.sto
├── msas_alphafold3/ # AF3-formatted MSAs -> json format
│   └── msas_alphafold3_data.json
└── msas_colabfold/ # colabfold-formatted MSAs
    └── 0.a3m

The files that are important for this practical are:

the predicted structures, recognisable by their .pdb or .cif extension (depending on what predictor was used)
their associated scores in pickle format (python binary files) sorted within the light_pkl sub-folders

⚠️ WARNING: make sure to match rank, seed and model numbers between the .cif/.pdb and .pkl files when importing them into ChimeraX!

Some basics

Open the most confident prediction of AFMassive into ChimeraX

Hint:

Look for a .pdb file within the afm_default folder with “ranked_0” in its name.

Change the lighting of the protein and the background colour

find the following icons and activate them:

Click to see answer

These 3 icons are within the “Graphics tab”. This tab groups quick-access tools to change background colour and lighting effects of your window.

Don’t hesitate to play around a little with the different effects.

Structure after activating the above settings

Play around with the following commands to get familiar with them & check they work for you

Don’t be allergic to using the command line…

You don’t have to use the command line in ChimeraX for basic functions but it offers more options and is quite easy to master, especially thanks to the messages in the log pannel.

Whenever you do some changes in ChimeraX through the buttons and menus, you will see the corresponding command line show up in the log pannel. Each command in the log pannel is a clickable link that brings you straight to the corresponding help page and detailed usage. All help pages can also be found here.

The command syntax is very simple, it always starts with the command name and you use spaces as separators. Each command has its own options that might or might not take values.

command name + option not needing a value e.g. color bychain
- command name: color
- option name: bychain
command name + option needing a value e.g. set bgColor white
- command name: set
- option name: bgColor
- option value: white

Most commands can be applied to a specific selection only. For this, you can use the following syntax:

Visualising AlphaFold (AFmassive) results

Colour the model by pLDDT (pLDDT scores are often saved in the bfactor field)

find the following icon and activate it:

Click to see answer

You will find this icon within the “Molecular display tab”. This tab groups quick-access tools to change structure granulosity (atomic, secondary structure and surface representation) as well as colouring (by heteroatom, chain, by position in the sequence, electrostatics & hydrophobicity…).

Don’t hesitate to play around a little with the different colours & representations.

In order to change the default colour palette used for bfactor colouring, you can, instead, use the command line to colour your structure and specify the alphafold colour palette:

color bfactor palette alphafold

Identify the least and most confidently-predicted portions of the structure according to the pLDDT

what kind of secondary structure was predicted for these regions?
does this surprise you?

Click to see answer

As a reminder, pLDDT scores are a per-residue score between 0 and 100. The higher the score, the higher the confidence in the local environment of a residue.

Commonly-accepted pLDDT colour scale from the EBI tutorial - you can even consider very low-scoring regions as being possibly disordered

As you can see, loops and terminal regions have a low confidence whereas well-structured regions have a high confidence i.e. AlphaFold is more confident in its prediction of the well-structured regions, and is not sure about the structure of the loop and terminal regions.

This is not at all surprising: Helices and sheets have well-defined, repeating backbone geometries and stabilising interactions. They are also often more evolutionarily conserved. All this information makes these regions easier to predict, thus AlphaFold is more confident in what it outputs.

Import the scores associated with the model you have in ChimeraX

The current model is ranked_0_unrelaxed_model_3_ptm_pred_2.pdb, it’s associated score file can be found within afm_defaults/light_pkl/. Pick the file with the same model and pred numbers.

You can import the scores into ChimeraX through the Menu bar in Tools > Structure Prediction > AlphaFold Error Plot.

Click to see help

The correct pickle file to import is result_model_3_ptm_pred_2.pkl within the afm_defaults/light_pkl/ folder.

After importing the scores, you should have an extra pannel appear that you can add to your ChimeraX main window if you wish.

Best AFmassive model & its PAE plot in ChimeraX

Note the buttons at the bottom of this new pannel:
- You can colour more easily by pLDDT, but also detect well-structured domains within the PAE plot
- If you hover over the PAE plot with your mouse, you will also see that the values next to the Help button change (=predicted alignment error values per residue pair)

Highlight the lighter regions in the PAE matrix

to what regions do they correspond in the structure?
are you surprised that the PAE values are high for these regions?

Click to see answer

As a reminder, the PAE plot is a square matrix with the same length & height as the number of residues in your structure. It does not show scores but estimated errors on the distance between 2 residues of the structure. It’s a value between 0 and 32 Ångström. You can see it as a “±” value that you can add to the actual distance you see in the predicted structure. Thus, the lower it is, the better (i.e. AlphaFold is more confident in the distance that it has predicted between 2 residues).

The default colour scale of the PAE matrix is explained when you click on the Help button. The colour scale tries to reflect the pLDDT colours: blue for low error values, yellow-orange for medium values and grey-white regions for high error values.

When you highlight a light area in the PAE outside of the diagonal, it will show the corresponding regions in the structure as pink (y-axis) and green (x-axis) areas.

Unsurprisingly, these light areas are seen between residues of well-structured regions and predicted loop regions with very poor pLDDT scores. Indeed, loops have generally fewer structural constraints (they are often linkers or turns), so their exact position relative to other parts of the protein is harder to predict (i.e. more prone to error).

Colour the structure according to domains in the PAE

how many domains do you see in the 3D structure?
how many domains does ChimeraX detect from the PAE plot?

Click to see answer

Well-packed domains typically appear as blocks of low error along the diagonal of the plot. These blocks represent regions where residues whithin the domain have low predicted positional error relative to each other, indicating a stable well-folded structure.

ChimeraX has many inbuilt features, of which, the identification of predicted well-packed domains from a PAE plot (first button at the bottom of the PAE plot). You can then “right click” on the PAE plot to colour the PAE as the structure and keep the PAE values as a background grey scale. This is a useful tool to more easily find correspondances between the structure and the PAE plot and to help you read the PAE plot.

Showing the predicted domains on the PAE

As you can see with the colours, ChimeraX found 3 different domains from the plot: 2 loops and the rest of the protein as a third well-packed domain. This is quite coherent with the structure.

PAE plots become particularly interesting with multi-domain and/or multi-chain proteins as they will help identify if 2 domains are likely to interact (lower PAE values) or not (higher PAE values). Unless the interaction is very strong (e.g. obligate oligomer), you should not expect the PAE values to match the intra-domain ones (the signal will be more diluted).

Colour the PAE plot according to the pLDDT score

are pLDDT and PAE values in agreement along the structure?

Click to see answer

To help you answer this question, you can use the same trick as before: first colour the structure by pLDDT, then “right click” on the PAE to update the colouring as in the structure.

The resulting plot isn’t the most easy to read but if you look closely, it depicts the pLDDT score along the diagonal, together with the PAE error values in grey scale.

PAE values and pLDDT scores represent 2 different types of confidence levels, but as you can see, in this prediction, they are quite in agreement in the predicted domains.

Comparing results between methods

Open the most confident ColabFold and AlphaFold3 predictions

Alphafold3: af3_default/ranked_0_af3_seed_409255_sample_2_pred_17.cif
Colabfold: cf_default/ranked_0_unrelaxed_model_3_ptm_pred_0.pdb

NB: you can colour all structures by model by typing color bymodel in the command line at the bottom of the Window

Align and compare them to the AFmassive model we have analysed previously

You can use the “Matchmaker” tool for this through the Menu bar in Tools > Structure Analisys > Matchmaker.

Click to see help

Matchmaker settings: select the reference and what models you want to align on it

NB: you can also restrict to a given selection by selecting the regions before running matchmaker, then by ticking the “Also restrict to selection” boxes.

As you can see, the structures do not superimpose that cleanly:

Superimposed best models (beige: AFmassive, blue: AF3, pink: colabfold)

To get a split view, you can use the tile function in the command line:

tile columns 3 spacing_factor 1

In the above command, we specify that we want 3 columns and we reduce the default spacing between models to a factor 1. To remove tiling, you can just type: tile off

“Tiled” best models (beige: AFmassive, blue: AF3, pink: colabfold)

Comparing with real structures

Many proteins share the same fold despite having <20% sequence identity and since structure is very linked to function, it can be beneficial to find structurally-similar proteins in order to better characterise a given protein of interest. In this case, our protein (uniprot id: A0A2U7UDN4_9VIRU) in Pandoravirus neocaledonia is not very well characterised (see Uniprot page).

Go back to your Foldseek output page and have a look at the top hits and their scores

Foldseek output on the top-scoring AFMassive prediction

Fetch one of the top-scoring PDB structures identified through Foldseek into ChimeraX

Open > Fetch by ID > PDB and type the PDB id.
Clean up the structure: e.g. in the case of 4FVJ it is chain B that interests us only.

You can split the 4FVJ model into its individual chains using the split command:

split #4 chains

This will create 8 sub-models in the model pannel. You can hide all chains with hide #4 atoms and then just show chain B with show #4.2 cartoon or just tick the corresponding boxes in the models pannel.

Align the PDB structure to the best prediction with Matchmaker

Compare the two structures

Is the fold found with Foldseek in a high-confidence region?

Click to see answer

Tiling with the tile command can be useful in this case in order to see more clearly. If you set both structures side-by-side and colour the model by pLDDT, you can see that confidence score is not too bad for the region that overlaps with 4FVJ.

Tiled best model coloured by pLDDT vs the aligned PDB structure (4FVJ) + the PAE matrix of the model

Run this analysis with other top-hits from Foldseek.