Tutorial: Modelling post-translational modified proteins with GROMACS

Initial Draft – Under Construction – The parameterisation and implementation of dihedral angles are yet to be included.

This beginner’s tutorial will demonstrate how to modify a wildtype protein with post-translational modifications (PTM). I will be using the crystal structure of wildtype (i.e. unmodified) collagen from an earlier publication where I went through the exact same process. Per the linked publication, I’ll add an advanced glycation end-product crosslink between two of the collagen polypeptide chains. It’s possible to replicate many of these steps with any protein and any post-translation modification. What I won’t de discussing

Prerequisites

You’ll need at least the following:

GROMACS simulation software suite.
Avogadro for modifying structures.
A protein starting structure geometry (and topology file will be helpful .itp) in .gro/.pdb (pdb2gmx helps shift from one to the other, however, it will not recognise the PTM structure unless it’s been added into the corresponding forcefield – more on that later).
Details on the PTM structure, topology, forces and charges:
- The structure on paper or a cartesian coordinate list. Having derived this using quantum chemistry software, the energy minimised structure becomes the PTM equilibrium structure. We plug these into our GROMACS forcefield files and we do our best to insert this structure into our starting structure geometry (.gro) file using Avogadro (or something similar).
- Bonded forces constants for angles and lengths appropriate for the overall force field. Dihedral bond information is hard to come by and doesn’t collapse out of the Hessian of a QM calculation. I would recommend seeing what the existing force field comes with but this will depend on how unusual your PTM structure is compared with known topologies within the forcefield.
- Atom charge information. I’ll be modifying the Amber force field family so my partial charges were derived using the R.E.D Server.

I will be using the structure, force constants and atom partial charge values from the QM optimised structure of the oddly named DOGDIC crosslink from my preprint.

Preparing the forcefield

You can’t get away from modifying at least some small part of the forcefield files. We’ll be adding in a new residue name to the geometry and topology file and grompp will always look for this in the corresponding topology data files. It’s always worth putting in the extra mile and preparing a complete set of forcefield files. That way pdb2gmx can shift any modified .pdb file into a GROMACS file whilst generating the appropriate .top (itp) file. Preparing the topology will involve a lot of work in the ./share/gromacs/top/amber99sb-ildn.ff/ directory.

Preparing the forcefield files

The QM optimised structure contains the bond lengths and angles at equilibrium. However, fitting that particular structure into the whole geometry may not be possible so we’ll rely on a series of energy optimisation steps and MD trajectories to account for the changes we’ll make during construction. Sadly, unless you’ve derived your force constants with software such as ForceGen, you’ll need to measure the bond distance between every atom pain and the angle between atom triplets.

Firstly, build your structure on paper or a chemical software package. This is the structure I’ll be attaching between two collagen polypeptides. You’ll notice several features. Each atom is uniquely labelled and are used as atom types in the atomtype.atp file. I have marked, in red, the special bond used to link two residues by some part that is not the peptide backbone. In this example, DN2 and DC13 form the bond. I’ve marked in blue the bond used to attach residues along a backbone chain, and between those bonds and the special bond we have two residues: DOG1 and DOG2.

Screenshot 2019-11-10 at 10.27.36

The following files must change:

atomtypes.atp

This file holds all atom types along with the atomic mass. Pretty simple, the type is used in the second column of an .itp file after pdb2gmx and in the second column in aminoacids.rtp for pdb2gmx production. Below is a short example of some atoms from the DOGDIC structure (above).

DH1             1.00800
DH2             1.00800
DH3             1.00800
...
DO1             16.00000
DO2             16.00000
DO3             16.00000
...
DN1             14.01000
DN2             14.01000
DN3             14.01000
....
DC1             12.01000
DC2             12.01000
DC3             12.01000
...

aminoacids.r2b

Both parts of the crosslink (an arginine derived and lysine derived) will be centre residues. If you want PTM amino acids as part of the N- and C- terminus, additional entries to this file and further residue entries to aminoacids.rtp must be made. For example, valine has a central and N- and C- entries, whereas my PTM molecules will always have residues on a backbone either side.

;      main  N-ter C-ter 2-ter
VAL    VAL   NVAL  CVAL  -
MOD1   MOD1  -     -     -
MOD2   MOD2  -     -     -
GOD1   GOD1  -     -     -
GOD2   GOD2  -     -     -
DOG1   DOG1  -     -     -
DOG2   DOG2  -     -     -
GLP1   GLP1  -     -     -
GLP2   GLP2  -     -     -

aminoacids.rtp

There are alternative means of implementing a new residue type without changing this file. However, I’ve always kept a backup of the original and worked on adding post-modified side chains into the backend force field files. Do this with caution.

Each [ atoms ] row requires four columns: atom name, atom type (as found in the atomtypes.atp file), atom charge, and atom charge group. Atom type matches all forcefield atom data e.g., bonded (ffbonded.itp), nonbonded (ffnonbonded.itp) and atomic weight (atomtypes.atp). Be warned, the quick and dirty approach of using matching atom name and atom type e.g., DO1 DO1 -0.416 5 comes with a few painful problems a lot later; without the significant hassle of converting the .gro file back into a pdb file and adding CONECT data (or by supplying a PSF file with the PDF), the likes of VMD will not understand how to render the bond connectivity. However, it is also worth noting that GROMACS pdb2gmx will fail if an atom name is used more than once in a residue. Atom types can be identical, atom names cannot.

So, how do we name atoms whilst maintaining the correct atom bond rendering in VMD? It turns out, after a little trial and error, all you need to do is begin each atom name with the atomic element symbol. For example, CD is the atom name for the CT atom type, however, you could also replace it with C99 or any label beginning with a C. Please note: this was from trial and error and may work in only a few cases and I’m yet to find official VMD documentation to back this up.

The official breakdown of the aminoacids.rtp file can be found here and lots here, both are well worth the read and go beyond the scope of this simple tutorial.

As I’m dealing with a PTM crosslinked protein, I’m adding two new side chains to the aminoacids.rtp file. The first part of the crosslink:

;DOGDIC crosslink
[ DOG1 ]
 [ atoms ]
N       DN1     -0.3394 1
HN      DH1     0.2125  2
CD1     DC1     -0.0601 3
HD1     DH2     0.0916  4
C       DC2     0.5414  5
OD1     DO1     -0.4646 6
CD3     DC3     -0.0126 7
HD31    DH3     0.0216  8
HD32    DH4     0.0216  9
CD4     DC4     0.0114  1
HD41    DH5     0.0034  11
HD42    DH6     0.0034  12
CD5     DC5     -0.0006 13
HD51    DH7     0.0190  14
HD52    DH8     0.0190  15
CD6     DC6     0.1004  16
HD61    DH9     0.0323  17
HD62    DH10    0.0323  18
ND2     DN2     -0.6124 19

 [ bonds ]
N       HN
N       CD1
CD1     HD1
CD1     C
C       OD1
CD1     CD3
CD3     HD31
CD3     HD32
CD3     CD4
CD4     HD41
CD4     HD42
CD4     CD5
CD5     HD51
CD5     HD52
CD5     CD6
CD6     HD61
CD6     HD62
CD6     ND2
-C     N

[ impropers ]

The second part of the crosslink:

[ DOG2 ] 
 [ atoms ]
N       DN3     -0.4625 1
HN      DH11    0.2490  2
CD7     DC7     -0.0118 3
HD1     DH12    0.0889  4
C       DC8     0.6382  5
OD2     DO2     -0.5468 6
CD9     DC9     -0.0632 7
HD21    DH13    0.0303  8
HD22    DH14    0.0303  9
CD10    DC10    0.0127  10
HD31    DH15    0.0329  11
HD32    DH16    0.0329  12
CD11    DC11    0.0306  13
HD41    DH17    0.0837  14
HD42    DH18    0.0837  15
ND4     DN4     -0.7517 16
HD5     DH19    0.3830  17
CD12    DC12    0.7837  18
ND5     DN5     -0.6837 19
ND6     DN6     -0.6853 20
HD6     DH20    0.3616  21
CD13    DC13    0.5005  22
CD14    DC14    0.2385  23
HD7     DH21    0.0570  24
CD15    DC15    -0.0161 25
HD81    DH22    0.0576  26
HD82    DH23    0.0576  27
CD16    DC16    0.0435  28
HD9     DH24    0.0764  29
OD3     DO3     -0.7057 30
HD10    DH25    0.4552  31
CD17    DC17    0.1044  32
HD11    DH26    0.0965  33
OD4     DO4     -0.5962 34
HD12    DH27    0.4162  35
CD18    DC18    0.1135  36
H1     DH28    0.0353  37
H2     DH29    0.0353  38
OD5     DO5     -0.5938 39
HD13    DH30    0.3676  40

 [ bonds ]
N       HN
N       CD7
CD7     HD1
CD7     C
C       OD2
CD7     CD9
CD9     HD21
CD9     HD22
CD9     CD10
CD10    HD31
CD10    HD32
CD10    CD11
CD11    HD41
CD11    HD42
CD11    ND4
ND4     HD5
ND4     CD12
CD12    ND5
CD12    ND6
ND6     HD6
ND5     CD14
ND6     CD13
CD13    CD14
CD14    HD7
CD14    CD15
CD15    HD81
CD15    HD82
CD15    CD16
CD16    HD9
CD16    OD3
OD3     HD10
CD16    CD17
CD17    HD11
CD17    OD4
OD4     HD12
CD17    CD18
CD18    HD1
CD18    HD2
CD18    OD5
OD5     HD13
-C     N

[ impropers ]

The –C indicates that an atom labelled C in the preceding residue will form a bond with N from this residue. As such, you’ll also need a C in this residue for the next residue along to form a peptide bond. Alternatively, you can remove this entry and build the bond by hand in the corresponding .itp after running gmx pdb2gmx. However, pdb2gmx is likely to view this residue as standalone peptide and therefore request terminal residue information. Providing as much information pre-pdb2gmx helps mitigate potential errors later.

specbond.dat

Bonds that cross between polypeptides and don’t make up the backbone, for example, a cystine bond are defined in specbond.dat. The file is in the directory that contains all forcefield directories and thus the file’s content is shared amongst each. For example, a cystine bond:

CYS     SG      1       CYS     SG      1       0.2     CYS2    CYS2

And the bond information for my PTM crosslinks:

MOD1    NM2     1       MOD2    CM6     1       0.125   MOD1    MOD2
GOD1    NG1     1       GOD2    CG12    1       0.125   GOD1    GOD2
DOG1    ND2     1       DOG2    CD13    1       0.125   DOG1    DOG2

Make sure you use the residue atom name (not the atom type). The distance (0.125) is the bond equilibrium distance and is taken from the QM calculation.

residuetypes.dat

This file ensures that GROMACS gmx commands recognise your new residue as part of a protein (or RNA or DNA). Using the residue name, I’ve added the following entries:

DOG1    Protein
DOG2    Protein
GOD1    Protein
GOD2    Protein
MOD1    Protein
MOD2    Protein

aminoacids.hdb

This file is a pain to build but it will ensure the correct hydrogen bond behaviour when using pdb2gmx. Let’s take a look at lysine as a simple example. The number 7 next to the residue name LYS indicates 7 heavy atoms that require hydrogens to bond with. The first column (red) is the number of hydrogen bonds assigned to the first denoted heavy atom in each entry (green, that runs i, j, k and l, and depends on the bond geometry). The third column denotes atom names (purple). When more than one hydrogen atom is required (e.g., the third to fifth row) a number is added to the hydrogen name in the third column e.g., HB1, HB2, HG1, HG2… HZ1, HZ2, HZ3.

LYS     7
1       1       H       N       -C      CA
1       5       HA      CA      N       CB      C
2       6       HB      CB      CA      CG
2       6       HG      CG      CB      CD
2       6       HD      CD      CG      CE
2       6       HE      CE      CD      NZ
3       4       HZ      NZ      CE      CD

The second column is bond-type and this is where it gets a little painful:

1 – One planar hydrogen e.g., rings or peptide bonds. Spot the first line: N -C CA, hydrogen is attributed to N as you would expect.

2 – One single hydrogen e.g., hydroxyl group.

3 – Two planar hydrogens e.g., -C=CH2 or amides -C(=O)NH2.

4 – Two or three tetrahedral hydrogens e.g., -CH3. Down the side chain of lysine CD-CE-NZ are three hydrogens forming a tetrahedral with the NZ atom.

5 – One tetrahedral hydrogen e.g., C3CH. In all amino acids, the arrangement N-CA(-CB)-C forms a tetrahedral off the CA atom.

6 – Two tetrahedral hydrogens e.g., C-CH2-C. In lysine, CB, CG, CD and CE all have two type-6 hydrogens. This runs down the length of side chains in the likes of arginine, lysine, histidine etc.

(type 7, 10 and 11 are for water molecules).

The first cross-link residue:

DOG1    6
1       5       HD1     CD1     C       N       CD3
1       1       HN      N       -C      CD1
2       6       HD3     CD3     CD1     CD4
2       6       HD4     CD4     CD3     CD5
2       6       HD5     CD5     CD4     CD6
2       6       HD6     CD6     CD5     ND2

ffbonded.itp

I won’t discuss the bonded parameters in too much detail, it’s beyond the scope of this tutorial. Prior to considering MD simulations, one would have had to acquire such parameters from either the literature, experimentation, a QM step, or a series of structural optimisation steps. In my example, I am adding to the AMBER forcefield family, therefore I’ve derived my bonded parameters by calculating the Hessian of the electronic structure with ForceGen (software I wrote a few years ago). Alternatively, one could try using ANTECHAMBER, but I have little experience using it.

Whichever method you use the distance and angle force constants are stored in ffbonded.itp. Atoms are denoted using their type (not name). Columns i and j denote the atom bond connectivity, the third column is the equilibrium bond distance in nm and the fifth column is the force constant in kJ/mol.

[ bondtypes ]
; i    j  func       b0          kb
;DOGDIC
DN1     DH1     1       0.099           171548.261
DN1     DC1     1       0.144           121887.165
DC1     DH2     1       0.108           153399.965
DC1     DC2     1       0.153           89908.146
DC2     DO1     1       0.122           315482.806
DC1     DC3     1       0.154           84206.191
DC3     DH3     1       0.108           45856.944
DC3     DH4     1       0.109           44286.368
DC3     DC4     1       0.153           93448.692
DC4     DH5     1       0.108           152748.149
DC4     DH6     1       0.109           132493.413
DC4     DC5     1       0.153           95415.177
DC5     DH7     1       0.109           89672.313
DC5     DH8     1       0.109           178049.914
DC5     DC6     1       0.154           105352.500
DC6     DH9     1       0.108           110735.315
DC6     DH10    1       0.109           124359.421
DC6     DN2     1       0.144           63531.021
N       DC2     1       0.133           176436.798
C       DN1     1       0.135           162246.033
DN2     DC13    1       0.125           186307.095
N       DC8     1       0.133           176423.750
DC8     DO2     1       0.121           325547.562
DC8     DC7     1       0.153           96705.913
DC7     DH12    1       0.108           115294.316
DC7     DN3     1       0.146           116086.542
DN3     DH11    1       0.100           139423.472
DN3     C       1       0.134           190916.125
DC7     DC9     1       0.154           84120.874
DC9     DH13    1       0.108           115947.250
DC9     DH14    1       0.109           127095.404
DC9     DC10    1       0.153           105841.202
DC10    DH15    1       0.109           405220.497
DC10    DH16    1       0.108           138813.491
DC10    DC11    1       0.153           109300.079
DC11    DH17    1       0.108           107670.503
DC11    DH18    1       0.108           67798.216
DC11    DN4     1       0.146           121228.221
DN4     DH19    1       0.100           221292.284
DN4     DC12    1       0.137           160604.034
DC12    DN5     1       0.126           266534.962
DC12    DN6     1       0.138           153142.711
DN6     DH20    1       0.100           82736.086
DN6     DC13    1       0.140           99438.630
DC13    DC14    1       0.155           95552.445
DC14    DN5     1       0.146           101817.935
DC14    DH21    1       0.109           138339.658
DC14    DC15    1       0.153           53330.327
DC15    DH22    1       0.108           149115.074
DC15    DH23    1       0.108           150363.975
DC15    DC16    1       0.153           106887.426
DC16    DH24    1       0.108           85574.655
DC16    DO3     1       0.142           140524.412
DO3     DH25    1       0.095           238922.320
DC16    DC17    1       0.153           96996.331
DC17    DH26    1       0.109           123848.916
DC17    DO4     1       0.140           87739.129
DO4     DH27    1       0.095           102129.168
DC17    DC18    1       0.152           110262.387
DC18    DH28    1       0.109           74839.712
DC18    DH29    1       0.109           136564.454
DC18    DO5     1       0.139           69438.079
DO5     DH30    1       0.095           96059.354

There are too many bond angle force constants to display them all, however, here is a reduced example:

[ angletypes ]
;  i    j    k  func       th0       cth
;DOGDIC
C       DN3     DC7     1       126.671         292.174
DH23    DC15    DC16    1       107.002         203.476
DC15    DC16    DH24    1       109.717         270.997
DC11    DN4     DH19    1       114.348         107.553
DH20    DN6     DC13    1       119.864         254.587
DH22    DC15    DC16    1       110.645         96.376
DH23    DC15    DC16    1       107.002         203.476
DH13    DC9     DC10    1       111.124         193.321
DH14    DC9     DC10    1       109.541         121.168
CT      N       DC2     1       124.326         278.510
H       N       DC2     1       117.820         262.167
N       DC2     DC1     1       115.327         361.208
N       DC2     DO1     1       123.562         356.408
DO1     DC2     DC1     1       121.111         347.062
DC2     DC1     DH2     1       108.461         113.542
DC2     DC1     DC3     1       112.632         263.616

Columns i, j and k denote the atom bond connectivity with the angle centred on j, the fifth column is the equilibrium bond angle in degrees and the sixth column is the force constant in kJ/mol.

ffnonbonded.itp

As with the bonded force constants, the derivation of non-bonded forces is beyond the scope of this tutorial. I took values from atoms with very similar structural chemistry to those already parameterised in the AMBER forcefield. However, if your structure is a bit beyond a modified amino acid, parameterisation using e.g, QM energy surface scans may be required. The DOGDIC non-bonded values are:

[ atomtypes ]
; name      at.num  mass     charge ptype  sigma      epsilon
DN1     7      14.01    0.0000  A   3.25000e-01  7.11280e-01 ;*
DN2     7      14.01    0.0000  A   3.25000e-01  7.11280e-01 ;*
DN3     7      14.01    0.0000  A   3.25000e-01  7.11280e-01 ;*
DN4      7      14.01    0.0000  A   3.25000e-01  7.11280e-01 ;*
DN5     7      14.01    0.0000  A   3.25000e-01  7.11280e-01 ;*
DN6      7      14.01    0.0000  A   3.25000e-01  7.11280e-01 ;*
DO1     8      16.00    0.0000  A   2.95992e-01  8.78640e-01 ;*
DO2     8      16.00    0.0000  A   2.95992e-01  8.78640e-01 ;*
DO3     8      16.00    0.0000  A   3.06647e-01  8.80314e-01 ;*
DO4     8      16.00    0.0000  A   3.06647e-01  8.80314e-01 ;*
DO5     8      16.00    0.0000  A   3.06647e-01  8.80314e-01 ;*
DC1     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC2     6   12.01    0.0000  A   3.39967e-01  3.59824e-01    ;*
DC3     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC4     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC5     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC6     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC7     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC8     6   12.01    0.0000  A   3.39967e-01  3.59824e-01    ;*
DC9     6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC10    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC11    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC12    6      12.01    0.0000  A   3.39967e-01  3.59824e-01 ;*
DC13     6      12.01    0.0000  A   3.39967e-01  3.59824e-01 ;*
DC14    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC15    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC16    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC17    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DC18    6      12.01    0.0000  A   3.39967e-01  4.57730e-01 ;*
DH1     1       1.008   0.0000  A   1.06908e-01  6.56888e-02 ;*
DH2     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH3     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH4     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH5     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH6     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH7     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH8     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH9     1       1.008   0.0000  A   1.95998e-01  6.56888e-02 ;*
DH10    1       1.008   0.0000  A   1.95998e-01  6.56888e-02 ;*
DH11     1       1.008   0.0000  A   1.06908e-01  6.56888e-02 ;*
DH12     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH13    1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH14     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH15     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH16     1       1.008   0.0000  A   2.64953e-01  6.56888e-02 ;*
DH17     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH18     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH19     1       1.008   0.0000  A   1.06908e-01  6.56888e-02 ;*
DH20     1       1.008   0.0000  A   1.06908e-01  6.56888e-02 ;*
DH21     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH22     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH23     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH24     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH25     1       1.008   0.0000  A   0.00000e+00  0.00000e+00 ;*
DH26     1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH27     1       1.008   0.0000  A   0.00000e+00  0.00000e+00 ;*
DH28      1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH29      1       1.008   0.0000  A   2.47135e-01  6.56888e-02 ;*
DH30     1       1.008   0.0000  A   0.00000e+00  0.00000e+00 ;*

Further information on modifying forcefield files can be found at the GROMACS home page.

Building the initial geometry

I’ll be adding the crosslink to the collagen protein structure using Avogadro. I’ll then use pdb2gmx to convert the PDF file to a GRO file whilst generating the topology data.

There are a few points to note when modifying structures in Avogadro. Swapping a group of existing atoms for a different group e.g., hydroxyl for carbonyl, adds atom line entries within the position of the residue in the file. The replaced atoms will usually (there are probably exceptions that I’m unaware of) be denoted with HETATM. Adding a new residue adds atoms lines to the bottom of the file. If the residue is recognised e.g., a standard amino acid, then ATOM is used, otherwise, HETATM is used.

HETATM means hetero atom and is applied to non-standard residues of protein, DNA or RNA, as well as atoms in other kinds of groups, such as carbohydrates, ligands, solvents, and metals.

Firstly, I took a geometry optimised structure from an earlier simulation of human collagen as my wildtype. A collagen protein is made up of three polypeptide chains. This particular system has nine proteins i.e., 27 polypeptide chains. We’ll be crosslinking two polypeptide chains in one protein.

collagen

Ignore how the tightly bound collagen peptides have detached from the main fibral, what you’re seeing is a periodic boundary artefact. The file is in a .gro format and the atom layout is compatible with my AMBER forcefield of choice. To go from a .gro to .pdb I simply use editconf:

gmx editconf -f initial.gro -o initial.pdb

To sanity check that I won’t have any problems with the elements of this structure that won’t be modified I run it through pdb2gmx:

gmx pdb2gmx -f initial.pdb -o test.gro

I select the AMBER forcefield of choice and then I select no water. If this back-and-forth approach works, then you have an overall structure that won’t be causing any problems. It’s worth keeping an eye on the coordinate values. Moving between .gro and .pdb using gmx should preserve the coordinate values:

As a .pdb (stored in angstroms):

ATOM      1  N   PRO     1       0.170  20.430  13.400  1.00  0.00

As a .gro (stored in nanometres):

    1PRO      N    1   0.017   2.043   1.340  0.0040  0.1964 -0.0452

It is important to note that Avogadro preserves the coordinates when saving as a .pdf:

ATOM      1  N   PRO     1       0.170  20.430  13.400  1.00  0.00           N

This is important to remember as we’ll be removing a single protein (three polypeptide chains), modifying it, then adding it back into the overall structure.

Next, I remove the first three polypeptide chains that make up a complete protein. If I’ve taken this directly from the .gro file I’ll need to pass it through gmx editconf to get a .pdb file. When loaded into Avogadro it looks like:

Screenshot 2019-11-12 at 12.51.23

Given the box dimensions, you’ll now begin to appreciate why the final .gro coordinate file looked as though the tightly packed protein had come apart.

Begin by identifying which part of the collagen protein will be modified along with the corresponding complete residue (including the N-C(H)-C(=O) backbone). Be very careful that you remove the correct backbone atoms. Check using Atom Properties in Avogadro and the atom index in the .pdb file or rather than using Avogadro to remove atoms, simply open up the .pdb file and delete both identified residues.

Having removed the complete amino acid, I’ve highlighted the neighbouring residue backbone C Beta and N atoms.

Screenshot 2019-11-12 at 14.12.44

The empty space left behind is very noticeable when a complete residue has been removed from both neighbouring polypeptide chain (a crosslink requires two placements along the backbone).

Screenshot 2019-11-12 at 14.19.11

It’s a good idea to save the file at this point. I’ll be adding different crosslinks for further simulations at a later date.

Adding a crosslink

With the collagen prepared the next steps are creating the crosslink and adding the crosslink into the collagen framework.

Creating the crosslink is fairly simple using Avogadro’s set of Draw Tools. It’s best to do this in a separate Avogadro window so one can then perform a basic energy minimisation on just the crosslink. When the structure is prepared (save and store) copy the crosslink into the collagen framework and orientate it near the peptide backbone gap.

Screenshot 2019-11-12 at 14.44.38

Before connecting the crosslink to the collagen backbone it is very important to fix the coordinates of the collagen so they don’t adjust when we perform a simple energy minimisation in Avogadro. Using your mouse highlight all collagen atoms to the left of the gap whilst leaving a few of the immediate backbone atoms unselected. Selection Extensions->Molecular Mechanics->Fix Selected Atoms. Repeat this for all atoms to the right of the gap. Draw bonds from the crosslink to the collagen peptide. Do not perform this the other way around as we want to preserve the labelling of the neighbouring peptide backbone. Also, ensure that Adjust Hydrogens is turned OFF.

Screenshot 2019-11-12 at 14.47.03.png

With the long bonds in place, it’s time to position the crosslink in place by performing an energy minimisation within Avogadro. This may take ten to fifteen minutes depending on your hardware and on how close you positioned the crosslink to the peptide backbone.

Screenshot 2019-11-12 at 15.09.49

With the crosslink attached to neighbouring peptide backbones and inserted in place using energy minimisation steps, it’s time to manipulate the atom labelling so the file is recognised by gmx pdb2gmx. Save the new collagen structure as a pdb file.

Load the pdb file into VMD and position the content of the pdb file in a text editor (VI in this instance) and the VMD console output and the VMD display next to one another. Using the pick command we can use the mouse to left-click select on each atom. The index value plus one (VMD counts atoms from 0, much like C/C++) is the location of the atom in the pdb file.

Screenshot 2019-11-13 at 10.55.45

Go through each heavy atom (ignore the hydrogens), identify their position in the file using their index and rename the atom name according to the aminoacids.rtp entry. Don’t forget to rename the residue name from UNK to either DOG1 or DOG2. Avogadro will have placed both parts of the crosslink (as we use a special bond in specbond.dat to connect the two residues) together. After completing all atom names and residue names, the output will look similar to:

ATOM    946  ND1 DOG1   89      27.956  21.534  10.750  1.00  0.00           N
ATOM    947  CD1 DOG1   89      29.129  21.166  11.568  1.00  0.00           C1-
ATOM    948  CD2 DOG1   89      30.431  21.826  11.062  1.00  0.00           C
ATOM    949  OD1 DOG1   89      30.740  21.681   9.890  1.00  0.00           O
ATOM    950  CD3 DOG1   89      31.056  19.409  11.689  1.00  0.00           C1-
ATOM    951  CD4 DOG1   89      30.618  18.058  12.310  1.00  0.00           C
ATOM    952  CD5 DOG1   89      29.074  18.071  12.547  1.00  0.00           C
ATOM    953  CD6 DOG1   89      28.417  16.706  12.884  1.00  0.00           C
ATOM    954  ND2 DOG1   89      28.351  15.920  11.674  1.00  0.00           N1-
ATOM    955  H   DOG1   89      28.927  21.473  12.612  1.00  0.00           H
ATOM    956  H   DOG1   89      30.691  19.193  10.656  1.00  0.00           H
ATOM    957  H   DOG1   89      30.474  19.997  12.242  1.00  0.00           H
ATOM    958  H   DOG1   89      31.107  17.885  13.292  1.00  0.00           H
ATOM    959  H   DOG1   89      30.928  17.254  11.608  1.00  0.00           H
ATOM    960  H   DOG1   89      28.563  18.488  11.653  1.00  0.00           H
ATOM    961  H   DOG1   89      28.886  18.771  13.389  1.00  0.00           H
ATOM    962  H   DOG1   89      27.393  16.840  13.291  1.00  0.00           H
ATOM    963  H   DOG1   89      29.023  16.183  13.653  1.00  0.00           H
ATOM    964  H   DOG1   89      27.465  20.755  10.280  1.00  0.00           H
ATOM    965  ND3 DOG2   90      28.735  20.268   5.308  1.00  0.00           N
ATOM    966  CD7 DOG2   90      30.164  20.410   5.701  1.00  0.00           C
ATOM    967  CD8 DOG2   90      30.633  21.872   5.503  1.00  0.00           C
ATOM    968  OD2 DOG2   90      30.547  22.348   4.379  1.00  0.00           O
ATOM    969  CD9 DOG2   90      30.405  19.949   7.040  1.00  0.00           C
ATOM    970  H   DOG2   90      29.675  20.166   7.842  1.00  0.00           H
ATOM    971  H   DOG2   90      31.411  19.928   7.379  1.00  0.00           H
ATOM    972 CD10 DOG2   90      30.178  18.482   6.818  1.00  0.00           C
ATOM    973  H   DOG2   90      30.066  17.763   5.977  1.00  0.00           H
ATOM    974  H   DOG2   90      29.688  19.441   6.544  1.00  0.00           H
ATOM    975 CD11 DOG2   90      29.468  17.898   8.003  1.00  0.00           C
ATOM    976  H   DOG2   90      29.516  18.632   8.837  1.00  0.00           H
ATOM    977  H   DOG2   90      29.992  16.970   8.310  1.00  0.00           H
ATOM    978  ND4 DOG2   90      28.076  17.632   7.696  1.00  0.00           N
ATOM    979 CD12 DOG2   90      27.391  17.012   8.758  1.00  0.00           C1-
ATOM    980  ND6 DOG2   90      27.912  15.891   9.437  1.00  0.00           N
ATOM    981  ND5 DOG2   90      26.503  17.661   9.414  1.00  0.00           N1-
ATOM    982  H   DOG2   90      30.749  19.901   4.992  1.00  0.00           H
ATOM    983  H   DOG2   90      28.626  19.591   4.541  1.00  0.00           H
ATOM    984  H   DOG2   90      27.568  18.418   7.259  1.00  0.00           H
ATOM    985 CD13 DOG2   90      27.458  16.056  10.741  1.00  0.00           C1-
ATOM    986 CD14 DOG2   90      26.300  17.050  10.702  1.00  0.00           C
ATOM    987  H   DOG2   90      28.762  15.373   9.166  1.00  0.00           H
ATOM    988 CD15 DOG2   90      24.919  16.358  10.715  1.00  0.00           C
ATOM    989  H   DOG2   90      26.374  17.810  11.503  1.00  0.00           H
ATOM    990 CD16 DOG2   90      23.730  17.362  10.723  1.00  0.00           C
ATOM    991  H   DOG2   90      24.844  15.691   9.827  1.00  0.00           H
ATOM    992  H   DOG2   90      24.839  15.727  11.626  1.00  0.00           H
ATOM    993 CD17 DOG2   90      22.361  16.627  10.681  1.00  0.00           C
ATOM    994  OD3 DOG2   90      23.822  18.308   9.684  1.00  0.00           O
ATOM    995  H   DOG2   90      23.797  17.917  11.683  1.00  0.00           H
ATOM    996 CD18 DOG2   90      21.159  17.554  10.963  1.00  0.00           C
ATOM    997  H   DOG2   90      22.355  15.876  11.500  1.00  0.00           H
ATOM    998  OD4 DOG2   90      22.191  15.926   9.471  1.00  0.00           O
ATOM    999  OD5 DOG2   90      21.057  18.563   9.996  1.00  0.00           O
ATOM   1000  H   DOG2   90      20.257  19.099  10.244  1.00  0.00           H
ATOM   1001  H   DOG2   90      21.286  18.012  11.969  1.00  0.00           H
ATOM   1002  H   DOG2   90      20.231  16.938  10.957  1.00  0.00           H
ATOM   1003  H   DOG2   90      21.976  16.581   8.756  1.00  0.00           H
ATOM   1004  H   DOG2   90      23.906  17.817   8.826  1.00  0.00           H

The order of the atoms in a residue does not matter. The pdb2gmx will map the input atoms to the atoms in the aminoacid.rtp database on a residue by residue basis. Having already deleted two residues to fit the crosslink in, I divided this list by DOG1 and DOG2 placing them into their respective locations; this was essential, otherwise, pdb2gmx would form the backbone peptide bond between the wrong residues.

Generating GROMACS coordinate and topology files

This is perhaps the easiest step providing no mistakes have been made. Execute:

gmx pdb2gmx -f collagen.pdb -o crosslinked.gro -ignh

It’s essential that you ignore hydrogens (using -ignh). We didn’t adjust their atom name using VMD, rather, we are relying on the hydrogen bond database entry for DOG1 and DOG2.

Providing no errors occurred (I had a few warnings of long bonds, namely from having centre amino acids at the peptide termini; simply renaming PRO to CPRO and GLY to NGLY fixed these) you should have a .gro file and a .top file.

A final note on dihedrals. Whilst we didn’t provide any force constant values for dihedrals in the force field, gmx pdb2gmx will have built dihedral bond descriptions in the topology file. Those must be removed from the topology file if you hope to see this work with gmx grompp. However, I recommend you populate the dihedral angles using the same steps for bond length and bond angle.