Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Biomedical Engineer - Biomelucal Modelling Laboratory

Completed notes of the course

Complete course

1 BIOMOLECULAR MODELING LABORATORY 2 LEZIONE 1 [16/09] – INTRODUCTION PROJECTS Enzyme engineering -> development of methods to improve existing enzyme or new enzyme. So one application is if we have an enzyme that catalyse a chemical reaction. Enzyme usually are very useful but also very unstable so they will last for few hour. The idea is to use comp utational modelling to generate the enzyme variants that are more stable and t hen go to the lab . In this way, we just prove some of them in real life in order to save time and money. Generally the work in the lab is quite expensive and time consuming. De sign of self -assembling peptides -> one project in Milan is to design peptides that self -assemble. There are proteins that self -assemble in nature and they are not good , they are called amylose protein. For example the Alzheimer disease is due to the self -assemble into fibers of the a beta protein. Also the Parkinson disease is partially related to a self -assembling protein and the mad cow disease as well. However if we learn from nature how this ammino acids self -assembly we can use this tool to create for exam ple fibers or gels or particles that are made of peptides. So they are extremely biocompatible and we can use them for cell culture. So it will be very useful and inte re sting . Nanoplastics -> are small (5 -10 nm) plastics particles that are increasingly found in the water of the oceans. The study focus on the relationship between nanoplastics and protein. They promote the self -assembly of the protein. In particular, they work as seed where a protein start s to s elf -assemble . So they can be related to the insurgence of disease that are related to protein self -assemble. In the study, they are using molecular modelling to understand how proteins behave in the presence of this nano particles. Molecular visualization with Augmented reality -> another project is to develop tools for molecular visualization. Proteins are extremely complex , they are three dimensional object with a very complex architecture. Understanding the three dim ensional structure is very important because even small changes in the structure can lead to loss or increase of function. Also, for design ing a drug you must think that it must perfectly fit the protein. Also the drug that we use target a protein. So 95% of the drug interact with a protein or an enzyme. We will use VMD and there are other software in the virtual machine called Pynol which is also for visualisation of molecular s tructure. However the protein is in a 2 D screen and in this way you lose some information. The idea is to develop a method using augmented reality to see protein in a 3 D hologram which in theory could help to better understand the protein structure. BIOMOLECULES A biomolecule is a general term that we use to identify molecules that are presents in organism or present a biological function. There are four main class of b iomolecules: 1. Protein which are extremely important for biology. They are most study family in biomolecular modelling and major focus of this laboratory. 2. Carbohydrates 3. Lipids 4. Nucleic acids Proteins make up the 60% of our body weight. They make up for diffe rent functions: transport, chemical reaction, structure support…The fascinating thing is that they are made of the same 20 building blocks which are the amino acids. 3 PROTEIN This are the key elements of our course, the amino acids, they are 20. They are called in three different ways: they have a full name, three letter code and single letter code. Here there are 21 because there is also included silocysteine which is an amino acid but is extremely rare so it is in most cases not consider as one of the 20 standard amino acids. Amino acids are all made of five elements: carbon , oxygen , nitrogen , hydrogen and just a few also contains sulphur . They are just a small part of the pe riodic table. It is important that we start to recognise the colour code: generally in all visualization software you will see atoms coloured in a standard way. Hydrogens are white, nitrogen are blue and oxygen is red. Generally carbon are light blue but s ometimes red. The amino acids are made of four different sections. We have an ammino group, a carboxyl group, a side chain and an alpha carbon. The ammino group, the carboxyl group and the alpha carbon are all share with all 21 amino acids. The only differ ence between the amino acids is the side chain which is connected with the alpha carbon which is the center of the amino acid. We can group the amino acids by their characteristics. One of the major characteris tic of amino acids is how the side chain inte racts with water. We can find several amino acids that are hydrophobic (they don’t like water). Some are polar or hydroph ilic (they like water). Other are charged which are also hydrophilic. There is also glycine which ha s as side chain an hydrogen atom. Hydrophobic amino acids can be recognized visually because the side chain is mostly made of aliphatic chain or rings. They are made mostly of carbon atom and they repel water. Hydrophilic amino acid present oxygen or nitrogen atom in the side chain and th ey are able to form bond with water. Charged amino acids can be negatively or positively charged. They strongly interact with water and also with each other (negatively charge and positively charge attract each other). PRIMARY STRUCTURE The sequence of a protein is encoded in the DNA. Every three letters in the DNA code are codes for one amino acid. Inside the cell when there is a mRNA , the ribosome add amino acids at the chain of the protein. The amino acids are connected through amino and carboxyl group. The carboxyl group of one amino acid is connected to the amino group of the subsequent amino acid forming a peptide bond with a realise of one water molecule. Protein are complex three dimensional structure but they are built as a line ar structure. So they are just a string and then by it -self form a three dimensional structure. This a section of a protein which is constructed. We can recognise what is called protein backbone , the inner structure of the protein. This is made by the re peating sequence of nitrogen, alpha carbon (where the side chain is attached), the carboxyl group carbon and the oxygen. Then i t start s another amino acid with the same element s. This sequence of nitrogen, carbon, carbon makes the protein backbone. 4 SEC ONDARY STRUCTURE The protein than folds into three dimensional structure. There are secondary structure which are local three dimensional structure which the protein form s. Two of the major secondary structure are: alpha helix and beta sheet. Alpha helix connects amino acids that are close in the sequence. In beta sheet the amino acids that are faced in this secondary structure are not necessarily close in the sequence. Alpha helix are always a wright handed helix and there are hydrogen bonds that will make it stable. Hydrogen bond are weak electrostatic interaction between a donor (which is usually oxygen or nitrogen), a hydrogen in between and an electro negative atom whi ch is often oxygen or nitrogen at the other hand. The oxygen in this case has a partial negative charge, the hydrogen has a partial positive charge so there are a weak electrostatic interaction. The single electrostatic bond is w eak but there are many of t hem connecting the backbone of the alpha helix so they make the alpha helix stable. Same goes for the beta sheet, it is also stabiliz ed by the hydrogen bonds between the atoms in the backbone of the protein. In all the secondary structures the side chain doesn’t play a role. TERTIARY STRUCTURE The protein folds it -self into a global three dimensional structure that we called tertiary structure. This is primarily made by the side chain. In general we cannot predict given a sequence what will be the three dimensional structure of a protein. Computational modelling are very helpful but we know the general rules and so we cannot predict a folding. We know that one of the major driver is that hydrophobic amino acids w ant to stay inside the core (away from the water ). While hydrophilic amino acids want to stay outside in the external shell and in contact with water. QUATERNARY STRUCTURE In some cases several chains joint together forming a quaternary structure. Not all protein form a quaternary structure. The classical example is h aemoglobin which is made of four different protein s. PROTEIN FUNCTION  Defence -> antibodies are proteins that recognize some part of a virus, for example, and they targeted it for its destruction .  Transport –> some pro teins are embedded in the cell wall and they connect the external and internal part of the cell. Basically they decide who is going inside the cell. The sodium calcium pump are one example. This protein allows to calcium ions to go inside the cell membrane but not for example magnesium ions. The role of the transport protein is to distinguish between calcium and magnesium ions and allow only to the first one to go inside the cell.  Communication – > one very important protein is insulin. It is a small protei n that is reali sed in the blood streams and it connects to other protein of the cell surface. It basically tell the cell to open the gate and to enter the glucose inside. If there is a problem in the production or the recognition of the insulin there is a disease called diabetes.  Storage –> ferritin stores iron atoms inside the body which are otherwise toxic. Also it realised the ions when it is necessary.  Enzymes – > are a class of proteins that accelerate chemical reaction. They basically decrease the e nergy barrier from going to the reagents to the products. They are extremally important 5 for many biological purpose. One example is the a mylase which is an enzyme presents in or saliva that cut complex carbohydrate chains.  Structure –> protein can provide structure. Every tissue in our body has a sort of mechanical function. For example collagen is our building material and it self -assemble into complex structure. At the end it can make , for example , tendons which are 90% collagen. MODELLING We started sever years ago with the ability to manipulate things at the very macro scale. So getting something that is in the order of cm. then as technology progresses we have been able to micro machining and getting something that is in the mm or micro meter and we are now in the Armstrong range (nano meter range). So it is very useful to have a tool to design also biological molecules at the nano scale. We are not yet there b ut we are going there. The idea of why we need biomolecular modelling is that we have a lot of tools to design macro engineered element (hip prothesis, stent etc.). Some example are Finite Element Modelling and Computational Fluid Dynamics. In this way we can design a product. Of course we need to do prototype and testing in the real world but we are pretty confident that what we design is already good. However for biological products, enzymes, we are not there. The design of biological molecule is up to now mostly based either on trial and error (make some changes and see what happens) or random (you do thousands of random mutations and see what happened, this is called directed evolution). In the last case they use bacteria that are replicated by making errors and they introduce mutation in the protein like normal evolution but accelerated. 99,9% of this mutations are deleterious so they won’t help to make the protein more stable. but maybe 0,01% maybe be produce interesting protein, so you select this st rain that may produce the stable protein and you do another cycle of evolution. It is directed because you select which evolution line goes on. This is very expensive, long and not necessarily successful. The idea is to also for the nano scale get a tools that allows to design at the computer a new protein or a modify protein and then go to the lab with a few that you think are successful. Molecular modelling you can see as a method investing atoms and molecule or as a sort of microscope that allows you to see molecules in a way that is not possible with experiment. You have a basically nano meter resolution and nano second resolution in time. So there are some pros of molecular modelling: 1. Time and space resolution that are difficult to achieve with experimental techniques . 2. Total control in the chemistry condition. You can change a molecule in more or less the way you like. You can add mutation it is only a computer model. So you don’t need to find complex chemical reaction and purification methods to get that modification in the lab . 3. This are generally time and cost efficient at least relatively to wet -lab. There are, of course, some limitation: 1. The same advantage of time and space is also a limitation. We will see in the lab th at molecular simulation are computationally expensive. So generally for standard computational resource we can simulate thing in the order of micro seconds and in the nano meter range. We cannot do a simulation of an entire cell, it is too big and to o comp lex so dynamics would take forever. Of course using very large computational resources you can expand a little bit but you cannot still simulate things in the order of mm. 2. At least for the method that we will see ; we w on’t see chemical reaction. The chemi cal structure of our proteins is fixed. We won’t talk about quantum mechanics and quantum chemistry so there will not be bond creation. Everything is fixed form that point of view. 6 3. We will see that the time and space limitation is also a limitation in the sampling of energy surface. If you have energy barrier that has to overcame, it may take too long to overcame. For example, in the cell , protein s start as linear string s and then self -assemble into a three dimensional structure. W e could generate a protein model that is linear and do a simulation. In principal that would work but in practice there are energy barrier to overcame for some turn to make or some folding to form and that would take too long for a large protein. We can predict the folding for smal l peptides. WHY MODELLING For example, I’m in Milan and I want to go from point A to point B. this is a map of Milan. This is the reality. Of course knowing the reality that for our context could be a quantum mechanics. We can describe in principal the all physics at the purest form. this makes not that helpful to do from point A to point B. I can managed but it is very complex. Or I can use metro map. So that makes much easier to know how to go from one point to the other through the con nections. In our case the metro map is of course the model. It is a n idealization or simplification but capture some essentials. In our case, for example, distances or relative orientation are distorted but that doesn’t matter because I really not need to know the distance between two metro stations. What I need, at least for my purpose, are just the connections. I know where I start, where I want to go and what are the connections. Again, the map misrepresents distances and directions but displays the co nnectivity. So a model, at least a successful model, is something that distort some element of reality but capture some feature that are relevant for my purpose. There is not a universal model that is “one size fit all”, that is good for everything. We nee d to know in advance what is our goal and if the method that we are using is relevant for our purpose. So if we have a set of data, we can have very simply model that doesn’t really capture the physics, the truth behind the data. Or we can have a ver y complex model which is just overfitting of the data. So it’s not really helpful to predict beyond the data. A good model is something that capture the essential trend. It may distort some elements but it can be helpful to predict beyond the data that has been parametrized. BRIEF HYSTORY OF (BIO)MOLECULAR MODELLING  It started in 1957 with Alder and Walnwrigth that used a super computer (at the time). The computational power is like one of our calculator. The use it at the IBM and they use it overnight do to some test of a molecular simulation of argon. 7 They model argon as a hard sphere and this is the force field that they have used. So the energy is a function of the distance between the particles. They are hard sphere so the energy is zero, they can just move around, if they bump int o each other there is a spike i n the energy (there is a purely elastic interaction).  In 1964, Rahman published simulations of liquid argon using a more complex potential (Lennard -Jones potential); calculations of system properties, such as the coefficient of self - diffusion, compared w ell with experimental data.  In the 1970 Karplus, Levitt and Warshel developed methods that combined quantum and classical mechanics to calculate the courses of chemical reactions using computers. They achieved the Nobel Prize in chemistry 2013.  1969 War shel developed the Consistent Force Field (CFF) so that the minimum energy structure could be found after the energy terms had been appropriately calibrated.  1969 Levitt used the CFF for the refinement of protein conformations using a macromolecular energ y minimization procedure .  1977 first molecular dynamics simulation of a small protein. The simulation concerned Bovine pancreatic trypsin inhibitor (BPTI), which has served as the first case of protein dynamics because of its small size, high stability an d relatively accurate x -ray structure. APPLICATIONS OF BIOMOLECULAR MODELLING We can study the structure of a protein. We can start whit a structure that is obtained experimentally (crystallography, electro -microscopy…). We can study the changes in the structure of a protein that is in an environment with a temperature higher than 0 K. We can see how the protein evolve in time. Also we can use biomolecular modelling to refine the structure of a protein . In fact the structure could not be determine d for each amino acids, there could be missing parts. We can use protein model to generate models of the missing part and use molecular dynamic to refine these models. We can do perturbation, so see how the structu re evolve if we add some external stimulus. For example, we can add a ligand to a protein for see how it changes. We can also introduce a chemical mutation in the structure or add a mechanical force, an external potential 8 or change the solvent and see how the protein evolves. We can study processes like the transport across a membrane and look at the details of how ions crosses the protein structure. So we can identify the steps, the barriers. For example we can study how a mutation leading to a disease, w hich is the molecular mechanism, why that mutation leads to a disease. We can study protein folding so from a linear sequence of amino acids we can study process of folding into a three dimensional structure. Or the process of ligand binding. How a drug o r a ligand connect to a protein and what are the effects on the protein. 9 LEZIONE 2 [23/09] – EMPIRICAL FORCE FIELD QUANTUM MECHANICS Quantum mechanics represents the true reality as far as we know (so it represents the satellite image of Milan). It is not so much use because it is too complex. It is not very helpful. At the level of the atom the Newton laws are not valid and we describ e the trajectories of particle by as sociating each of them with a wave function solution of Schrodinger equation. The problem is that the Schrödinger equation is too complex to solve and we can use it only for a single hydrogen atom. An ab initio molecula r dynamics (i.e., based on QM) has the advantage of describing in detail the electronic, chemical and physical properties of the system without placing any strong initial hypothesis on the chemical behavior of the individual species involved. We can use nu merical approximation and get an approximate solution of Schrodinger equation . In this way we can extend the system that we can create and solve using quantum mechanics . However using current computational possibility what we can describe using quantum mec hanics are number of atoms that are in the order of hundred s. Also, phenomena that are in the order of pico second. On the other hands the beauty of quantum mechanics is that it represents the reality and we do not make simplification or assumption . If we want to describe a system that are in the protein dimension (micro second to milli second and nano meter in range ) we have to get away with quantum mechanics and we have to do some approximation in order to extend our modelling range . The approximation that w e use is empirical force field. The main approximation , that is implied in force field method , is to ignore the electrons and calculate the energy of a system as a funct ion of the nuclear position only. In this way we cannot describe phenomena that depend s on them. Principally the creati on and formation of chemical bond because this is due to the sharing of electrons. In some cases force field methods can provide answers that are as accurate as even the highest -level quantum mechanical calculations, in a fraction of the computer time. However, molecular mechanics cannot of course provide properties that depend upon the electronic distribution in a molecule. FROM QUANTUM MECHANICS TO CLASSICAL MECHANICS Going from a quantum description to a classical description where empirical force field are based on classical mechanics. This relays on few approximation s: 1. Born -Oppenhe im approximation –> since the dynamic of electrons and nuclei are on completely different time table; we can coupled them and consider them independent. W e can consider the dynamics of nuclei separately from the dynamics of electron s. Nuclei are much slow er than electron s. 2. We can treat nuclei as a point particle that follow classical Newtonian dynamic . In classical molecular dynamics, the effect of the electrons is approximated as one potential energy surface, usually representing the ground state. THE FORCE FIELD After these approximations, w e can describe in principal the energy of the system dependi ng on the position of nuclei (we don’t consider the electrons). 10 The force field way to describe the energy of the system is to define it as a sum of energy turns that describe the interaction between atoms. In particular, we have a turn that define the str etching of a possible bond. We have a turn that define the deformation of an angle between three atoms. A turn that describe the rotation of an angle (interaction between four connecting atoms). Then two terms that describe the electrostatic interaction a nd the Van Der Walls interaction (the non -bonded interactions). In the force field we define the best interaction, the low energy interactions, and any deviation from this reference state leads to a penalty. PROPANE EXAMPLE Let’s consider a propane with eleven atoms. Let’s count how many interaction we have to consider to model this molecule. We will have a number of bonds (10). Then there will be angles between three atoms (18). Then we have torsions, so we have few bonds that can rotate (18). Then we have non - bonded interactions which are Coulomb interaction and Van Der Walls interaction. Non -bonded interactions are consider only for atoms that are separated at least by three bonds. In total we have 27 non -bonded interaction. Already for a small molecule of 11 atoms we have a number around maybe 50/60 interactions to consider. FUNCTIONAL FORM + PARAMETERS The force field is the specific functional form that we use to describe the energy of a molecular system (the equation that we use to de scribe each interaction ) and the set of parameters. For example, for bonding interactions, we use a s a model a string. So you have an elastic constant that describe how strong the bond is and a reference length which describe which is the equilibrium leng th of the bond. The force field is the combination of the system equation and the parameters. There are several force field, w e will use one called Charmm during the laboratory but there are other s (Amber, Gr omos..) . There is not a true or exact functiona l form or parameters because they are empirical. So the best force field is the one that is better suited for the application that you are trying to full filled . It is possible that , at least in part , force field has the same functional form (the one for bonds are the same) but they may have different parameters because they model different aspect of proteins. They , in theory , can also have the same parameters but different functional form (this is mor e difficult). 11 It is not correct in principal to take parameters from one force field and applied to another because the parameters and the functional form go together. T he parameters are specific for th at functional form so they may not work as well for another for ce field . TRANSFERABILITY Really important feature of force field is transferability. So each and every molecule will have a set of combination of chemical bonds that is different. If we want mo del each and every molecule it will be too much. The idea of the force field is that the y can be transfer from one molecule to the other. If I model the parameters for the oxygen -hydrogen bond these should be the same if I treat a water molecule or the oxy gen -hydrogen bond in propanol. If I find the best parameters that are a good parameters to describe the chemical bond between oxygen and hydrogen, in principal I can apply these parameters for each and every molecule that has this bond. So if I find the b est parameters to describe the chemical bond between oxygen and hydrogen, in principal I can use this parameters for each molecule that have the same bond. It is helpful because proteins are made of 20 amino acids so if I optimize the parameters to describ e each of the 20 amino acids, then I am able to describe each and every protein. It is basically infinite the number of protein using 20 amino acids and using different length of the chain . In some cases it maybe needed to create specific parameters for a very specific molecule. If I want to model a drug that it is interacting with my protein, then it will be good to develop a very specific parameters for that specific molecule. This is a task that require time and effort but we are very interest in that specific molecule. So it will be helpful to do. But it is a very specific topic. FORCE FIELD ARE EMPIRICAL Another characteristics that we have to consider is that the force field are empirical. The parameters that we will see are find by t ry and error. Find the best parameters that describe at the best the experiments or quantum mechanics calculations. So t here is no functional form or parameters that are the truth sense, they are approximation. The functional form is a compromise between the accuracy and the computational cost ; the most accurate functional form may often be unsatisfactory for efficient computation. As the performance of computers increases so it becomes possible to incorporate more sophisticated models. The all concept of e mpirical force field is a compromise between accuracy and computational cost. Otherwise if we want to model the truth nature of physic we have to use quantum mechanics. ATOM TYPE Another aspect that is important to consider is the one of atom type. The atom type is more than just the atomic number of an atom, it usually contains information about its hybridization state and sometimes the local environment. So n ot all carbon or oxyg en are the same. In the force field we may have several atom types that are different from the element. We have an oxygen as a n element but we may have four different oxygen like in the example. These four different oxygen have different parameters and the y depend on the chemical environment of the atom. For example , in this force field, an oxygen that is connected with a single bond to another atom is called 02 and it will have a set of parameters for bonds, length and so on. An oxygen that is bonded to a nother atom with a double bond, it will have a different atom type with a set of different parameters and so on. Oxygen connected to an 12 hydrogen will be another atom type and oxygen connected to two other atoms that are not hydrogen will be another atom ty pe. We now discuss in some detail the individual contributions to a molecular mechanics force field, giving a selection of the various functional forms that are in common use. BOND STRETCHING If we want to model the stretching of a chemical bond the best approximation of quantum mechanical calculations we can use it is the M orse potential. The Morse potential has the form: This is the energy as a function of separation. You see there is a min im um for a specific distance, then the energy increase an d it goes to zero for larger distances that are infinite. If a squish the two atoms together the energy increases. However, this functional form has three parameters (e, a and l0). So although it is the best approximation for the quantum mechanics calculat ion s it is rarely use d because in biomolecular modelling is very unlikely to stretch single bond . Moreover, it is rare in molecular mechanics calculations for bonds to deviate significantly from their equilibrium values. We don’t really need this fine approximation. In most cases (99,9% of the time) in force field bond stretching are described by the Hooke law (F = kx) like a string bonded two atoms. The reference bond length is the value that the bond adopts when all other terms in the force field are set to zero. The forces between bonded atoms are very strong and considerable energy is required to cause a bond to deviate significantly from its equil ibrium value. This is reflected in the magnitude of the force constants for bond stretching. This has the advantage to have only two parameters and if I have to find all the permutation of bonds between atom type, saving one parameter to find is good . Also for the computational point of view, this functional form is much faster to calculate than the Morse potential. It is really a brute approximation. Th ere is basically no connection between the harmonic and Morse potential expect near the minimum. As said in most cases we are not stretching bonds or we are not considering chemical reaction so modelling the vibrational bonds near the minimum is exactly wh at we need in biomolecular simulation. This is the best approximation we can use considering the compromise between accuracy and computational cost. As an example you have a set of parameters for different atom types. There is a single carbon -carbon bond which has a reference length of 1,5 A and an elastic constant of 317 kcal mol -1 A-1. A double bond has a reference length which is shorter and the elastic constant is more or less the 13 double. This makes sense because double bond are stronger and the atoms are closer. The force field will contain a set of these parameters. ANGLE BENDING The second term is the angle bending, the deviance of angles between three atoms from the reference value. Also in this case the best compromise between accuracy and compu tational cost that has been found is again the Hooke law. Rather less energy is required to distort an angle away from equilibrium than to stretch or compress a bond, and the force constants are proportionately smaller For each type of angle we will have an elastic constant which describe how difficult it will be to change the angle from its reference value and the reference value . TORSIONAL TERM Most of the variation in structure and relative energies is due to the complex interplay between the torsional and non -bonded contributions. For the torsional term we cannot use Hooke law because of the periodicity. When the two hydrogens in a ethane are aligned there will be a maximum in the energy. While there will be a minimum when the two hydrogens are not aligned. If you continue to rotate you will find again in the aligned situation and so on. If you do that for 360° you will have three minimums and three maximums. That is why the torsional term are describe usually by a functional form that consider a cosine function . In this case you have four terms: 1. Vn is the barrier height. How strong is the interaction, how difficult is to go from the stag gered situation to the eclipsed or aligned situation. The higher the value of Vn, the higher is the energy. 2. Periodicity (n) meaning how many maximums or minimums you have in 360°. In the case of ethane you have three minimums and three maximums. 3. Omega is the angle 4. Gamma is the phase factor where is the initial minimum. How the functional form is shifted on the x axes. It determines where the torsion angle passes through its minimum value For the torsional term you will need three parameters. The functional form that we have described up to now are found almost everywhere. IMPROPER TORSION TERM In most cases but not always there is also an additional bonded term (improper torsion term). Suppose you have this chemical structure (ketone) , with three carbons and one of them connected through a double bond with an oxygen. If we use the parameters for bond and angles we will obtain a structure that is with the oxygen away from the plain defined by the three carbon. 14 However, experimentally if you get the crystal structure of this chemical, you see that the oxygen l ays in the plain formed by the three carbons. In some case s you can exploit an additional improper term (dihedral term) which consider four atoms , connected by three subsequent chemical bond , that are not in line and force the atom s to lay on the same plain. There are different functional form but again the most used is the Hooke law. You have an elastic constant and a reference angle formed by the four atoms. in this way it is possible to set phi0 to 0 and with a high elastic constant we can force the atoms to stay in the same plain. NON -BONDED INTERACTIONS – EL ECTROSTATIC INTERACTIONS We have interaction between atoms that are from different molecules or are far away in the same molecule. There are two main non -bonding interaction: the electron static interaction and Van Der Walls . For the first one, e ven if a molecule is electrostatically neutral (like water), atoms can have different electronegativity. In particular oxygen is highly electronegative so tend to attract electrons. While hydrogen is not electronegative. So within the molecule electrons, t he charge, is not even distributed. We can assign to each atoms of the molecule partial charges which are empirical . In fact there are not in reality partial charge s. The electrical charge of the molecule is zero but we have a dipole inside the molecule. In this way there are a partially negative charge on oxygen and a partially positive charge on hydrogen. This meaning that if we have several molecules we have electrostatic interactions between atoms that are on different molecules. The electro static in teraction are calculated using the Coulomb law : NON -BONDED INTERACTIONS – VAN DER WALLS INTERACTIONS Electrostatic charge are not sufficient to describe non -bonded interaction within a system . If you consider two helium atoms which are noble gas. A molec ule of helium is made of a single atom of helium. There is no possibility to have a partial charge, the charge of helium atom is zero. However they have interactions. They have the Van Der Walls interaction which has the functional form represented in the picture. It has a minimum for a specific lengt h, goes to zero for an infinity distance and increases at infinite for shorter distance. It is the combination of two dif ferent fundamental interactions. An attractive contribution and a repulsive contribution. The attractive contribution is due to the formation of instant dipole within the atoms. The electron cloud surrounded the nucleus is not static so it changes over tim e. You can have a temporary distribution which is not uniform. For example, a molecule can have a more negative charge on one side and an atom near -by it is influenced by this distribution. Since electron repelled them self you will have less density in th e electron cloud of the atom near the first one mention. This process goes on for all the atoms in the system. So t his leads to an attractive interaction . Then if we squish together the two atoms we have a strong repulsive interaction . This is due to the Pauli principal that says that two atoms don’t want to overlap. 15 You have two separate phenomena, one repulsive and one attractive. When you combine them together it leads to this function: One of the most used function form to describe this Van Der Wall s interaction is the so called Lennard Jones 12 -6 function . It has two parameters: 1. Epsilon which is the energy depth. So the higher the value of epsilon, the stronger the interaction is between two atoms and more difficult is to move the two atoms from the equilibrium distance. Remember that this two atoms are not chemically bond, are just clos e. 2. Sigma is the value where the potential reaches zero. So the term with the 12 exponential is part of the curve where the limit towards zero is infinite and then for larger distance the term with 6 exponential dominate the function. For larger distance th e energy goes to zero because they don’t see each other anymore. NON -BONDED INTERACTIONS – VAN DER WALLS INTERACTIONS – MIXING RULES There are hundreds of atom types so there are maybe 10 carbon, 10 hydrogen, 10 oxygen and so on. So it will be in princi pal a huge number of pairs to consider and to parameterized. It is too much because there will be too many interactions, too many sigma and epsilon to find. The approximation that is used is mixing rules. Imagine we need to consider the non -bonding interac tion between an oxygen and a carbon. We will need to find the sigma and the epsilon for an oxygen and a carbon. The way we do this is to use mixing rules. So the sigma of the Van Der Walls interaction between atom A and atom B is just the arithmetical mean of the sigma value for the interaction between atom A and atom A plus atom B and atom B. It is the average value of the sigma when two carbons interacts and two oxygens interact. Same goes for the epsilon value. This means that if you have 100 atom types you need to find 100*2 parameters and not all the combination of 100. This saves a lot of times and this is found to be a good approximation. THE FORCE FIELD 2 We describe the force field. Now knowing the position on each and every atom of the system and having found the best parameter to describe, we can define the potential energy of a mole cular system of atoms. there are different force field for biomolecular simulations, during the course we will use C HARMM but there are other like Amber, Gromos or OPLS . They have different set of parameters , different set of function form and they are opt imized in different way. But for most application they are interchangeable and you can use one or the other. PROPANE EXAMPLE 2 Now we go back to the propane example and we said that we have a certain number of bonds, angles, torsions and non -bonded interactions. But how many parameters we have to find? In this example, the carbon won’t be the same. This two carbons which he named CA are different from the carbon in the middle. This CA are connected to a methyl group while the other is connected to only two other hydrogens and two carbons. So they will behave differently. Same goes for the hydrogens. HA can be modelled with a differe nt atom type respect to HB. 16 For bond terms we need find 6 parameters. Because we have three different type of bonds. We have the bonds between carbon A and carbon B. We have HA to CA which is the same throughout all the molecule (property of transferability). Then HB and CB. So we ha ve three type of bonds and each of the bond is using the Hooke law. In this way we have two parameters for all the bonds: elastic constant and reference length. We have 4 different type of angles ( HA -CA -HA, HA - CA -CB and so on), since angle term are model with two parameters we need to find 8 parameters to model the propane molecule. W e have just 2 types of torsions. The torsion between HA -CA -CB -HB or HA -CA -CB -CA. if you start anywhere else and you connect four atoms you will find that they are the only co mbination s. Of course they are symmetric so the terms HB -CA- CB -HA is exactly the same. Torsions need three parameters so we have six parameters in total that we need to calculate. For non -bonded parameters remember to consider only the atoms that are away at least three bonds. So we have HA -HA , HA -HB , HA -CA and in this case do not consider cou lomb interaction but let’s consider only Van Der Walls interaction. Each interaction needs two parameter and then we need in total 6 parameters. We only find parameter for HA -HA, HB -HB, CA -CA and then using mixing rules we determine the HA -HB or HA -CA using the average mean of sigma and of epsilon. Anyway the number doesn’t change. To describe a small molecule the number of parameter we need to find is relatively small . The interesting thing is if we model a larger molecul e; the number of interactions increase but the number of parameters , if we don’t include fancy atom types , remain manageable. 17 LEZIONE 3 [ 30/09 ] – SETUP MD (MOLECULAR DYNAMIC) SETUP A MD SIMULATIO N We will see how to setup, how to prepare the structure of a molecule in order to run a molecular dynamic simulation. This is the basic step to how setup and run a molecular dynamic simulation. We will cover toda y approximately half of them. The steps are: 1. You have to define an initial configuration of your system . It could be a protein structure that you downloaded from the protein databank . Or it can be a model generated in some other way. 2. Then you have to decide how to define the system in term of energy and interactions . It means to define which force field is suited for our purpose . We have to consider for our application if there is a force field that is better than the other. If you want to model a ligand binding or a protein folding maybe there are force field that are better parametrized for that specific application. We have to consider, also, if there are special chemicals modify a mino acids, ligands and so on in our system and if in the force field we want to use there are parameters for that specific element . if we are modelling a protein with only the 20 natural amino acids we have a larger choice. Otherwise we have to select dif ferent force field. 3. Then we need to define how to treat water . So all macro molecules are somehow surround by water; in our body or in a solution they are not dry. We need to add water and consider the solvent. I n 99% the solvent is water but in theory you can do also with a different solvent. 4. Then you have to decide how to treat boundar ies . Our system is not infinite, it will have a finite size. We need also to know how to treat long range interaction . In the fo rce field we have basically two types of interactions: bonded and not bonded. The first one are short range , they only consider atom that are couple of two, three chemical bonds apart. W hile the second one are long range . So, in principal, there are intera ctions between atoms that are very far away. Both interactions tend to zero for infinite distance s but they are never actually zero even at long distances. 5. Then we have to define MD parameters : temperature, pressure, constraints, external force s (stretch, compress the all protein or some part of the protein?) . Then we can really start the MD simulation. Basically t o see the evolution in time of our molecular system. It is usually divided into two phases which are not necessarily two separate simulation s: 6. The equilibration phase –> where we run the molecular dynamic simulation to reach equilibrium . In the preparation of the molecular system we did an approximation. For example, we included hydrogen atoms that are not included in crystal struct ure. These are added in a pseudo random position, they are guessed based on our knowledge of structure of amino acids. They are put to a certain distance, with a certain angle respect to heavy atom s. Heavy atom s are all other atom s then hydrogen. The posit ion, the distance, angles and so on of the hydrogens atom need to be equilibrated. Also water is included surrounding the protein but it is not in the best position (for example to make hydrogen bond with the amino acids ). We need a first phase of equilibr ation. Another reason is that crystal structure 18 are obtained by creating a crystal of a protein. So highly concentrated protein and crystal are approximate to 0 K to be shot with x -ray. The structure we see in the protein data bank are obtained at close t o 0 K. W hen we run a MD simulation we want to see the behavior of a protein at room temperature 300 K or at body temperature which is 310 K. W e need to let the molecule find the better shape, better structure at room or body temperature. When we decide tha t the equilibration is achieved we can run the production run or phase. 7. Production phase –> at the end of the equili bration phase commences the production phase. This is the actual dynamic simulation in which we will calculate properties to help to define the structure, we define how the ligan d interacts with our molecule and so on. The output of the production phase is a trajectory file. There are other outputs, we will see all of them. But t he main output of molecular dynamics is a trajectory file, the o ne that we will saw last week (DCD file on NAMD . It may have different extensions using different software but they are all snapshot of the protein during time). 8. Finally the simulation is analyzed , we can do different type of analysis. INITIAL CONFIGURATION First we need a starting structure, in most cases we prefer to start with experimentally (crystallography, x -ray, NMR) obtained structure and we can find it in the data bank. When he did the snapshot there were in total in the data ban k 1677 80 structures. However, the number of unique protein is much lower. For example, for hemoglobin or albumin you may have different structure of the same molecule. They may have a mutation, different ligands bound to it. It is called protein data bank becau se it started with protein and most of the structures are proteins. However, there are also structures of DNA and other biological molecules. If you look for albumin, you find tons of structures. You can scroll down the structures and you will see the ba sic information of each individual structure. you see a unique code and a four alpha numerical letter code which identify uniquely the structure. Then the title associated with the structure, the authors of th is structure and the publications . In combination with the position on the data bank there will be a paper associated in wi tch it is describ ed in detail the structure (when it was realized, which is the method and the resolution ). The lower the resolution the better the structure. A single b ond is in the order of 1 A, so with a resolution of 2 A you can see with an approximation individual atom. Knowing that amino acids have all different si de chain s and the y are clearly distinguishable at this resolution, it is possible to define the structu re of the protein. If you go to higher value of resolution (4 -5 A) it will became more difficult to define the position of the side chain s. Side chain s are more mobile and so when you see higher resolution generally you may see only the backbone of the pro tein. The general shape and not the position of the side chain s. It is also possible that some parts of the structure are missing. You may see that not all the amino acids are present in the structure. Then you see the organism from which the protein is ex tracted , what kind of protein and if there are ligands. Then if you click on each individual structure you see more in details the structure it -self. So there are a lot of information and here you can download two main files: the FASTA sequence and the PD B structure. The firs t one is a text with the sequence of the protein with a single letter code for each amino acids. The second one is what we need for the molecular dynamic simulation. 19 Unfortunately not all protein are available in the protein data ban k. It is really difficult if you have only the sequence of the protein and you want to do the modelling of a protein. In principal if we just start with the sequence of amino acids and run a molecular dynamic simulation , it will fold into a three dimension al structure. However, some of the protein mutate too much at least for the computational power of a computer. It can predict the folding of protein that are quite small. What we can do is to use a tool called homology structur e. For example, if we want to model a human albumin and there is no human albumin structure but there is the structure of the goat albumin. It is likely that there is a high homology between the two structures. So say 99% of the sequence of the amino acids are identical or homo logous. There is two different type of similarity between sequences: 1. Identity –> you have a different amino acid in the same spot . 2. Homology –> you have a similar amino acid but not the same . So it has the same characteristics but it is not exactly the same. If you have a high degree of homology ( there is no unique threshold but it is common to consider a 80% homology sufficient to start ) then you can assume that the structure is highly similar and you can use a technique called homology modelling. This is because the primary structure influences the tertiary structure. You start with a target sequence, the sequence that you want a model for and a template structure. This a structure from the pr otein data bank which has a sequence that closely match your target sequence. You do a sequence alignment (align the amino acids of the template sequence and the target sequence) and there are software that create a homology model. This software creates a n hypothetical structure of your protein. In this case it is very important to analyze the structure and the result of the modelling with very care. Also it is very important the equilibration phase and to make the protein finding the energy minimum. AD D MISSING ATOMS Before running the simulation you need also to add some atoms. In all crystallographic structure there are no hydrogen s. So we have to add hydrogen atoms and the re may be also other missing atoms (maybe in the side chain which are extremely mobile) . The more manufactur es you need to do with the starting structure the more important the equilibration phase will be. If we need to add all side chains, then we need to do a proper equilibration before we can say that the structure is properly equilibrated and we can do analysis on it DEFINE FORCE FIELD Now that we have the starting structure we need to define the force field that we want to use . So we have to define which set of parameters we will use and which functional form we will be using for our simulations. We need to:  Define how atoms are bonded in our structure (topology) .  Assign to each atom the atom type . Remember that the atom type is different form the atom element. Not all oxygen are th e same and in the PDB you just know that there is some oxygen in position x, y, z. To properly run the simulation and to calculate the interaction you need to define the atoms type of each atom in the system.  Define partial charges . All atoms will have a partial charge even if the all molecule is neutral (remember water). In other words, you have to define the electrostatic interaction s. 20  Define the parameters to use which are highly connect ed to the atoms types that you have used for each atom. WATER MODEL In general all biomolecules are surrounded by water. They are in general unstable outside a water element. Biological sample s are surrounded by water but also by other elements . For example, in the experimental lab the y are not in distilled water , th ere will be a buffer with a certain PH. In most molecular dynamic simulation we just use the equivalent of distilled water maybe with some salt. To treat water you can choose between two classes : 1. Implicit water –> the only particles that are present are th e one that describe the protein. Water is considered continuum by changing the epsilon in the electrostatic interaction parameters. This is very fast because if your protein is 1000 atoms you have 1000 particles in your system. However it is also rather in accurate because you miss the possibility to form hydrogen bonds between protein and water. That may be an important factor in the stability and behavior of the protein. In most cases , when possible, an explicit water model is used. 2. Explicit water –> in this case also water is treated as a collection of particles. Unfortunately, water usually takes 90% -80% of the number of atoms in the system. Because you need to surround your protein by a layer of water and so you will have a lot of water particles. For a protein of 1000 atoms you may have 10000 atoms that belong to water just to proper model the environment of the protein. When you run a molecular dynamic simulation of a protein, you basically running a MD simulation of water with a little bit of a p rotein. There are also different water model s. The basic and most commonly used water models are the SPEC/E and TIP3P (means you have three particle: one mimicking the hydrogen, one the oxygen and another for a hydrogen ; with a specific length, angle and partial charge assign to them) . However if you want to model a very accurate water behavior you can use also advanced models of water. For example, TIP4P which has an extra particle to better mimic the experimental dipole moment of the water molecule. Howe ver , even if there is only an extra particle, as said, water molecules takes 90% of the atoms in the system. So instead of considering three particles per water molecule you consider four particle for water molecule . You add a lot of atoms in your system for which you need to calculate interactions for. This is used only in very specific application. PERIODIC BOUNDARY CONDITIONS We surround a protein with water, but ho w many water molecules? If you build a layer of water molecules surrounding the protein, the protein may diffuse and then being in contact with the external layer. It is like the protein is in contact with the vacuum. So you have to create a huge layer of water, in this way durin g the simulation the protein can’t reach the external layer of the environment. However , that is really computationally inefficient because you have a lot of water that it is just there for mimicking the environment. One clever solution is to use periodic boundary condition. We start to build a box of water surround ing the protein (10 Armstrong in length more or less). We use this box to create a periodic system. This box is replicated infinite time s in all directions. 21 Say that you have a protei n in the center . Then if the protein moves, it will never reach the edge of the box because there is another box with water molecules there. In this way we model a protein like it is in bulk water without the need to include an excessive number of explicit particles. Since it is periodic, we don’t need to define the movement of all the particles in periodic system. Only the movement of the particles in the original system , because if an atom moves then all its periodic imagines move in the same way. This is another sketch of the periodic image. You have this box that is replicated in all directions and if a particle move in the original box, all periodic imagines move in the same way. What happen if a particle exit the box? It goes back into the box it -se lf on the other end. It is like a periodic image enter s in the other way. CUTOFF Even if the particles that we have to consider for the displacement are a finite number; the number of interaction is theoretica lly infinite now (long interaction). Because if you consider to have a particle in the cycle, you don’t have bonded interaction so all particles interact with non -bonded interactions. But a particle here do a electrostatic and Van Der Walls interactions with all the particles even the periodic imagi nes. Long range interactions tend to zero for infinite distances, in this case we have an infinite system (the periodic system is infinite in all directions). In principal there are an infinite number of interactions we have to take in account that will co ncern long ranging interactions. In fact for bonded interactions that is not a problem. Because if we have two particles that are bonded, they are the same for all the box. We need to find a solution. The solution is to use cutoff. It is especially useful for Van Der Walls interaction which tent to nearly zero for a long distance. so we can approximate at zero to save some computational time. The cutoff and the cell sizes are connected by what it is called minimum image convention. The id ea is that one particle should not interact with the same particle twice. So it should not see a particle and its periodic imagine. It means that the periodic box size should be at least 2 time the cutoff distance or vice versa. Since the cutoff distance i s dictated by when the interaction goes to zero, it is usually the cell size that depend on the cutoff distance. Cells_size > 2*cutoff distance So we define a cutoff of 10 A and we consider interactions only between particles that are within 10 A. In prin cipal, this doesn’t save any computational time because first we need to know if two particle are within 10 A or not. We need to calculate the distance within all particles (even period images) to know if they are within the cutoff or not. Since the partic les move we cannot define that at the begging of the simulation and then just use that. A solution is to define a neighbor list. 22 Beyond the cutoff we can define a neighbor radius which is larger than the cutoff distance to create a sort of list of neighb or. Particles that are within the cutoff are consider for the calculation of the interactions. For the particle P we will calculate the electrostatic and Van Der Walls interactions with all the particles that are white (because they are within the cutoff r adius). Then we take the number and the ID of all the particles that are in the neighborlist radius (particle in blue). These are particles that are beyond the cutoff, we won’t calculate the non -bonded interactions for the particle P and the blue particles . But they are close to the cutoff radius so we need to check at each step if the blue particles are within or outside the cutoff radius. If we choose the two radius in a clever way we don’t need to update this neighbor list each step. General ly c