How to morph proteins

(mainly using our software)

Periodically, someone writes us concerning the availability of our software, or with some special request for a heavily customized morph. We prefer not to distribute the entire Morph Server, since it's heavily tied to the accompanying database and is actually quite complex. However, we do have a very simple stand-alone CNS script that is capable of morphing anything from small molecules to entire ribosomal subunits. Read on for more- the link is at the bottom.

The complexity of the Morph Server is due to the complexity of the PDB format and the difficulty of datamining the PDB. Most of the truly novel functionality of the server is in the portions which prepare the structures for actual interpolation. In fact, it's virtually impossible to morph any two structures from the PDB without this preprocessing, even if they're almost exactly the same. (It's actually really difficult to morph some structures without spending several hours modifying them by manually.) That said, the morph server can also model pairs of proteins with significant sequence changes, e.g. the same protein in different organisms. (See the original paper for details; some of this is done in Perl, some in C, and some in the CNS scripting language. The actual morphing is nearly identical to this script.)

The standalone script will not do this. You must prepare your structures beforehand by making sure that residue numbering is exactly the same. The script will delete any unknown residues and atoms; these can be filled in by a number of methods, most of which are either messy or time consuming. (In the server, we just guess and then run energy minimization for 1000 steps. This is usually okay.) There is no one-size-fits-all solution to this; however, most individuals interested in generating their own movies will probably already have the necessary structure files, topology and parameter sets, etc.

Okay, the script: it reads in two files, "rf0.pdb" and "rf1.pdb", and outputs files labelled "frame0.pdb", "frame1.pdb", etc. It will convert chains to segments when loading structures, but will only output segment IDs. This method assumes that residue numbering is sequential and always starts at 1. We recommend using the X-PLOR/CNS structure file (.mtf extension) if this is not the case, or if you have gaps in the structure. Non-standard residues and heteroatom groups will of course need their own topologies loaded.

Actual morphing uses a technique called "adiabatic mapping". There's a complicated physics definition of this, but in this context it can be described as

  1. Interpolate
  2. Minimize
  3. Repeat. . .
Interpolation is done in Cartesian space, which is sometimes a problem. The benefit to this method, however, is that it can be executed entirely in the CNS scripting language very quickly and in very few lines of code. For a more realistic movie you could use steered molecular dynamics, but this requires far more compute power and can be extremely complicated to set up. This script takes anywhere from two seconds to 15 minutes per frame to run, with 30 frames usually translating into one second of an animation. You will almost certainly want to modify the number of cycles for minimization - the default is 60, which may be too many for some structures.

We distribute the script under two conditions:

Optional conditions are to send us links to any movies you make and/or tell us about interesting new protein motions you're working on, but these are not required.

Here's the script. We welcome any feedback but this is essentially unsupported software. When you're done, we highly recommend the program PyMOL for rendering the movie.

-- Nat Echols
Copyright 1995-2003, M. Gerstein and others
Email: Mark.Gerstein@yale.edu
Last modified Feb. 7, 2003