Documentation

This document starts with some background on the multimer construction from pdb coordinates. The usage of this program is explained further down.

The BIOMOLECULE information in pdb files

Many pdb files that represent molecules that are multimeric in vivo do not actually contain the coordinates of all subunits of the multimer, even if the multimer was observed within the crystal. Instead, instructions are provided for how to construct the missing subunits from the ones that are given. An example, from 5cev.pdb:

REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: HEXAMERIC
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000
REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000
REMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000
REMARK 350   BIOMT1   2 -1.000000  0.000000  0.000000        0.00000
REMARK 350   BIOMT2   2  0.000000  1.000000  0.000000        0.00000
REMARK 350   BIOMT3   2  0.000000  0.000000 -1.000000       69.40000

The lines with BIOMT in them specify the transformation matrices to be applied to each of the chains A,B,C. Each transformation is specified in three lines BIOMT1 - BIOMT3. Therefore, here we will obtain two copies of each of A,B and C - thus, a hexamer. All that MakeMultimer.py does is simply carry out these instructions and write the result to a new pdb file.

If you look closely at the first three BIOMT lines, you will notice that they contain only 1s and 0s. In fact, the 1s are placed such that the transformation will not move the molecule at all; these lines therefore simply copy the original A,B,C chains into the new hexamer. In many pdb files, you will see only lines like these:

REMARK 350 BIOMOLECULE: 1
REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: TETRAMERIC
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C, D
REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000
REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000
REMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000

This is a do-nothing operation. MakeMultimer.py will happily carry out these instructions, but the resulting molecule will be exactly the same as before.

Some files contain multiple biomolecule records, for example:

REMARK 350 BIOMOLECULE: 1
REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: TETRAMERIC
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C, D
...
REMARK 350 BIOMOLECULE: 2
REMARK 350 SOFTWARE DETERMINED QUATERNARY STRUCTURE: TETRAMERIC
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, D
...
REMARK 350 BIOMOLECULE: 3
REMARK 350 SOFTWARE DETERMINED QUATERNARY STRUCTURE: DIMERIC
...

In these cases, MakeMultimer.py will generate a separate output file for each biomolecule.

Program usage and options

Basic usage of MakeMultimer.py is straightforward enough - either upload a local pdb file (choose upload from the menu on the left) or specify a pdb code to be retrieved from the Protein Data Bank (choose retrieve).

The output can be controlled by several options. The first two options serve to provide unique identification for the replicated chains, through renaming and residue renumbering of chain copies that result from matrix expansion.

Use new chain name every

When a chain is replicated, the copies can be named differently from the original.

If this option is 0, no renaming will occur - all copies will be named like the original. If it is 1, every copy will get a new name.

If the value of this option is, for example, 5, than copies 1 (usually the original) and 2-5 retain the original name, copies 6-10 will share the next available name, and so on. This is limited by the number of available names (A-Z).

Use residue number offset

Chain copies that share a name (see above) can be distinguished from one another by renumbering the residues.

If this option is 0, no renumbering will occur. If it is 1000, then the first chain with a certain chain name will retain the original numbers (say 1-157), the second will get 1001-1157, the third 2001-2157, and so on.

Residue renumbering is subject to a limit of 9,999 as the highest residue number, which is inherent in the PDB file format. If that value is exceeded, the counter is reset to 1. Note that atoms are always renumbered uniquely per chain, too, up to a maximum of 99,999, which is again the maximum that PDB files can accommodate. On occasion, this may provide a fall-back solution for unique identification of replicated chains.

To illustrate the effect of these two options used in combination, consider this output, which was produced with chain renaming = 5 and residue renumbering offset = 1000:
Multimer expanded from BIOMT matrix in pdb file 1X9P.pdb
by MakeMultimer.py (watcut.uwaterloo.ca/makemultimer)

-------------------------------------------------------------
Chain  original  1st resid.  last resid.  1st atom  last atom
-------------------------------------------------------------
    A         A          49          569         1       1840
    A         A        1049         1569      2001       3840
    A         A        2049         2569      4001       5840
    A         A        3049         3569      6001       7840
    A         A        4049         4569      8001       9840
    B         A          49          569         1       1840
    B         A        1049         1569      2001       3840
    B ...
Choosing the right combination of chain renaming and residue numbering offset can facilitate the selection and formatting of chains or groups of chains in your molecular viewer.
Replicate backbone only

If this option is set, only the alpha-carbon and the peptide bond groups will be included in the output. (For nucleic acids, an analogous approach is adopted that yields a single continuous strand of atoms.)

Exclude all hetero atoms
Exclude waters

Self explanatory. The former option implies the latter, even if the latter is not set.

Output format

For each biomolecule replicated, MakeMultimer.py produces a separate pdb file. It also shows you the specific instructions that were applied in producing it.

You can download this file, or you can look at it in FirstGlance, a Jmol-based molecular viewing website maintained by Eric Martz. This site may load a little slow the first time around, since your computer will have to download the Jmol applet first, but the second time around it will be much faster.

Limitations of MakeMultimer.py

These are twofold:

  1. Limitations of the program, and
  2. Limitations of the quality of input data.

MakeMultimer.py only retains atomic coordinates and discards all other information in the pdb file. Heteratoms are kept only if they have a chain identifier that matches the ones specified in the BIOMOLECULE replication matrices; otherwise, they are simply dropped.

While the REMARK 350 BIOMOLECULE records are supposed to supply biologically significant interpretations of molecular structures, in many pdb files, the seem to be purely mechanically constructed, and their value and significance is not always clear. Some more background on structure files and biological multimers within them is available here.

Running MakeMultimer.py on your machine

You can download MakeMultimer.py and run it on your computer. You invoke from the command line, like so:

python MakeMultimer.py pdbfile

If pdbfile is a local file, this file will be used. Alternatively, if it is a valid pdb identifier, MakeMultimer.py will download it from the Protein Data Bank.

For this to work, you need to have Python installed on your computer.

Acknowledgement

I thank Eric Martz for valuable suggestions and discussion.