USTC/Repressor Evolution in Silico

From 2007.igem.org

< USTC
Revision as of 10:27, 17 October 2007 by ZhanJian (Talk | contribs)

Contents

Introduction

Protein Design

Design proteins with high selectivity for a board of different processes would be of tremendous practical interest for both science and the industry [1].Computational design is well concerned as its sufficiency and convenience. Redesign Protein-DNA complexes are so important that be concerned at the beginning of active sites redesign . Some keystones have been reported[2-5]. Here we are trying to construct several artificial repressor-operator pairs to serve as the connecting wires of our system.

There are several steps in protein design[6] . First , generate a random structure with random sequence .Second , optimize the sidechain structure for each random sequence .Third , score each random sequence and select the sequence with best score . In almost all the conditions , the number of sequence candidates is unconsiderable .It will be 2020 candidates if redesign 20 positions . So a great efforts have been done to decrease the computation complexity[1,7-14]: Using several discrete conformation , rotamer , to represent the status of sidechain ; Employing pair-wise energy function to score the sequence ; searching with Monte Carlo , Genetic Algorithm and etc. And another key here is the score function . Up to now , no method in silico has been given to exam the design results . One way to exam is to express the sequences designed in real system .

Lac Repressor

The lac repressor is a DNA-binding protein which inhibits the expression of genes coding for proteins involved in the metabolism of lactose in bacteria[15].There are three distinct regions in the protein . The headpiece ,which is a fragment with about 51 amino acids from its N-terminal and consists a helix-turn-helix motif , is the only region working for binding DNA . Figure 1 shows the NMR structure of the complex with two headpieces of lac repressor and its nature operator[16]. A series of experiments have shown that the mutation on the 7th , 9th site on the operator can sharply decrease the stability of the complex[15-17]. Figure 2 shows the structure of DNA sequence on site 7,8,9 and the residues interaction with them. From the structure , we can get that residue 17,18 ,which are YQ in native repressor sequence , take charge in recognizing the specific DNA sequence .The binding specificity may be adapted by changing the two positions.

For wires without interference , the binding specificity is highly requirement for the repressor-operator pairs. It is reported that the mutated repressor with VA on the position 17 ,18 can bind with the transition mutated operator on the site 7 [18]. Tranversion can lead to too great change that the DNA structure can not be built according to native structure and no mutated lac repressors can bind it. So only transition mutation done on the DNA , and the structure was built manually according to native structure ,then optimized with gromos96 , 43a1 forcefield. In this way , we obtain four DNA structures , and our aim is to redesign the recognize parts on the repressor to make it bind the DNA specific.

Computational Model

Score function

Our score function here is a combination of physic-based potentials and knowledge-based potentials . The energy items should be modified to suitable with rotamer library . The modified van der waal interaction , hydrogen bond energy , solvent accessible surface area (SASA)and electrostatic interaction are employed as items for score function.

Linearize van der waal interaction

The vdw interaction energy[19] between atom i and atom j at the distance of r is:

USTC formula vdw.png

Clashes will be more than native structure , caused by discrete represent proteins sidechains. A talerate acception of clash is given by linearizing the resplusize item in the vdw energy ,and the parameter are decided by test.

Hydrogen bond energy

given the distance between the donor and acceptor , the hydrogen bond energy is:

USTC formula hbond.png

Describing sidechain structure with rotamers is changing the distance between the donor and acceptor too , and the position of hydrogen atom can not be obtained in a easy way as it rotates. Here we build a knowledge-base potential base d on the distance of donor and acceptor in the coarse-grained structures.

SASA and Electrostatic Interaction

The energy items for SASA and electostatic interaction are defined by TM Handel and her co-workers[8].

Sampling

We optimize sidechain of the proteins and select sequence with different score function. Because they are totally different processes on concept. Sidechain optimization tells us the 3-dimensional structure of the complex after changing the amino acids in the protein and the amino acide sequence selection tells us which sequence is more suitable for binding. The role of each energy item is not the same in two process . The parameters for sidechain optimized are obtained by training with 49 structures picked from PDBbind library [20],and they for sequence selection are by training with the results of directed evolution.

Results

In this particular problem , only 2 positions on the protein to be changed ,so there are only 400 candidates . The computational complexity is no more a problem here . We optimized the sidechain of each candidate , and get the each energy items. With the directed evolution. We find the suitable parameters for sequence selection and predict that some other chains may work in this problem. And the experiments of test are processing.