I am interested in learning how to circumvent potential structural issues when simulating a protein-bilayer (atomistic or coarse-grained) model. Each step, minimisation, equilibration and production, must operate without flaw or risk of crashing.
Motivation
I am writing software that automates the complete construction, equilibration, and production simulation of hundreds of CG protein-bilayer systems. Without automation, it would be far too time-consuming for any one person to undertake. But I have a problem with preventing a system from failing to go through a complete workflow (see below).
Potential structural issues include but are not limited to steric clashes after solvating a system, inserting a protein in a bilayer, and LINCS warnings/errors from a physically unrealistic system. To be clear, my basic protein-bilayer workflow, with accompanying thoughts, might include the following steps:
Work flow
Build protein-bilayer: All stages before an energy minimisation of the complete system, including bilayer construction, protein construction, inserting the protein across the bilayer, solvating the design, adding counter ions, removing water molecules inside the hydrocarbon lipid chains, etc. A coarse-grained model is a lot more forgiving in terms of protein and lipid placement.
For a CG protein-lipid bilayer system, I will use tools such as martinize and insane to build protein-lipid bilayer systems and encode protein models into CG representations. Then I solvate the system and remove all water molecules from inside the bilayer. Usually, I’ll take the Z-coordinate of the phosphorus atom (or corresponding CG bead) in the lipid furthest out from both bilayer leaflet planes and remove all water molecules found between the two Z-values.
Lipids can clash with the placement/orientation of an integral membrane protein, so I will usually remove those lipids nearest to the protein. Often this can leave a void that quickly fills with water. Suggestions to resolve this include adding position restraints to the water during the early stages of equilibration and running a short NVT equilibration simulation; the surface area of the lipids will increase as the unit cell stays fixed.
A coarse-grained model can self assemble, however, this is not guaranteed, and I would always favour some degree of considered protein-bilayer construction over complete self assemble. There are several automated techniques, including martinize, inflateGro and mdrun.
Energy minimisation: It shouldn’t need saying, but this is an essential step. It helps resolve steric clashes and yields a structure as close to its bonded equilibrium values as possible within the confines of the minimisation scheme (and the energy minima you are inside). Coarse-grained systems can be a little more forgiving than atomistic systems. I’ve seen CG systems throw LINCS warnings and fail to converge, yet a short equilibration simulation with a small time-step can fix underlying problems.
Minimisation schemes include; steepest descent, quick and efficient, but they can bounce around a local energy minimum indefinitely; and conjugate gradient, slow to begin with but a lot more efficient closer to an energy minimum.
NVT equilibration: Keep this as short as possible for atomistic simulations (no more than 50 ps at a dt of 0.002). If left to evolve for too long, the bilayer can form pores, micelles and other artefacts. For relatively straightforward CG protein-bilayer systems, I will miss out on this stage. I will usually place position restraints (1000 kJ/mol) on all alpha-carbons for complicated CG systems and any atomistic approach. These position restraints help prevent the protein backbone from distorting under the effects of high velocities whilst allowing the side chains to adjust. I’ll use the Berendsen thermostat. (Note: You can use velocity rescaling instead. The use of Berendsen is an old habit of mine from the days before v-rescaling was introduced to Gromacs.)
NPT equilibration: This is where I introduce pressure to the system. I keep this relatively short (50 ps, maybe a little longer) with a Berendsen pressure coupling in both CG and atomistic system. In the complicated CG system and all atomistic systems, the position restraints stay on. After the simulation finishes, I use the velocities (.cpt file) as starting velocities for a further short NPT simulation whilst maintaining the position restraints. I change temperature and pressure to more appropriate schemes, e.g., velocity rescaling or nose-hoover for temperature and Parrinello-Rahman for pressure. I repeat feeding preserved velocities into short NPT simulations whilst reducing the position restraints from 1000 kJ/mol, 100 kJ/mol, then finally ten kJ/mol.
For simple CG systems, after running the Berendsen pressure coupling 50 ps simulation, I switch over the temperature and pressure coupling as mentioned above. Typically, you won’t need to systematically reduce position restraints as the CG secondary structure is predetermined.
Finally, as mentioned above, I will usually reduce the CG time-step down to 0.005 (from 0.025) and run for 50 ps if the energy minimisation stage fails to converge or threw LINCS warnings.
NPT production run: By now, I’m using the correct pressure, and temperature coupling and all position restraints are gone. The system should be free of flaw to be left to run indefinitely. Note: I’m considering whether the system is equilibrated, nor am I considering when is it appropriate to conduct analysis; those are different questions.
Final thoughts
The above was a very brief and simplified Molecular Dynamics workflow. I’ve missed a lot, for example, water to bilayer/protein concentration, ion concentration, cleaning up PDB files, metal-ligand coordination (with a whole sidechain-ligand force field parameterisation), pressure and temperature coupling parameters, and all the other mdp parameters!
In terms of building a bullet proof Gromacs protocol, one in which you knew a computer was automating everything, what would you include to avoid error and human intervention?