Calgary/evoGEM projectDesign
From 2007.igem.org
(One intermediate revision not shown) | |||
Line 1: | Line 1: | ||
<html> | <html> | ||
+ | <!-- | ||
+ | evoGEM Project Design | ||
+ | This page describes the design of the evoGEM project | ||
+ | All information on this page was adapted from Boris Shabash, the creator of evoGEM | ||
+ | --> | ||
<head> | <head> | ||
+ | <!-- | ||
+ | Cascading Style Sheet (CSS) component | ||
+ | in plain html style sheets are completely independent of the HTML page but are included in this page to simplify design within a wiki script environment. | ||
+ | This section describes the layout, colors and images displayed on the page | ||
+ | --> | ||
<style> | <style> | ||
Line 65: | Line 75: | ||
</head> | </head> | ||
<body> | <body> | ||
+ | |||
+ | <!-- | ||
+ | evoGEM Title | ||
+ | The title and and top level navigation links | ||
+ | --> | ||
<table class="header"> | <table class="header"> | ||
<tr> | <tr> | ||
Line 72: | Line 87: | ||
</tr> | </tr> | ||
</table> | </table> | ||
+ | |||
+ | <!-- | ||
+ | evoGEM main menu | ||
+ | The navigation menu for the evoGEM section of the site | ||
+ | --> | ||
<table class="links" > | <table class="links" > | ||
<tr> | <tr> | ||
Line 78: | Line 98: | ||
<td align="center" ><a class="mainLinks" href="https://2007.igem.org/Calgary/evoGEM_results" title="Results" >Final Results of EvoGEM</a> </td> | <td align="center" ><a class="mainLinks" href="https://2007.igem.org/Calgary/evoGEM_results" title="Results" >Final Results of EvoGEM</a> </td> | ||
</table> | </table> | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | <p><b>EvoGEM and The Registry</b></p> | + | <!-- |
+ | The following paragraphs introduction some of the background behind evoGEM | ||
+ | Introduction to evoGEM and Vigo::3D | ||
+ | --> | ||
+ | <p> EvoGEM was developed using C++ and a graphics agent engine known as Vigo::3D.is a library for multi-agent simulation and visualization in 3D space that was developed at the University of Calgary by Ian Burleigh. The code for Vigo::3D is open source at available at <a href="http://vigo.sourceforge.net/docs/dox/html/main.html" tilte="vigo code">http://vigo.sourceforge.net/docs/dox/html/main.html</a>. By using Vigo::3D the EvoGEM project has been able to generate clear graphical representations of the systems that are evolved. This gives a nice qualitative visual view of how the system works. </p> | ||
+ | <div style="float:left; border:#CCCCCC thin outset; padding:2px; margin:18px;"> <img src="https://static.igem.org/mediawiki/2007/f/f3/EvoGemSim3.gif" alt="EvoGEM simulation results" /> </div> | ||
+ | <p style="margin:10px; font-size:18px"><b>The Simulations</b></p> | ||
+ | <p> The image shows a still screen shot from one of the movies generated by EvoGEM. The large fairly transparent purple spheres represent RNA Polymerases. These polymerases are preprogrammed in with characteristics that define their ability to bind to the promoter, which is the darker green bar at the end of the main circut. The lighter green portion corresponds to a ribosome binding site, the purpleish pink component represents the protein coding region and lastly the red part at the end of the circut is a terminator. The definitions of the other spheres floating around represent various proteins and other factors whose function will be dependent on the type of simulation being run. </p> | ||
+ | <p> The parameters of the simulation are determined before running the simulation by setting the values in a <em>configuration file</em>. These files determine the nature of the simulation. The configuration files allow the user to define, among other tihngs: The number of generations. The target products the system should evolve. And the mutation rates of each generation | ||
+ | </li> | ||
+ | </ul> | ||
+ | <p style="margin:10px; font-size:18px"><b>EvoGEM and The Registry</b></p> | ||
<p>As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:</p> | <p>As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:</p> | ||
<ul> | <ul> | ||
- | <li> | + | <li> <em>Type</em> - what type the part is. ie Promoter, RBS ect </li> |
- | <em>Type</em> - what type the part is. ie Promoter, RBS ect | + | <li> <em>Part</em> - The ID of the part </li> |
- | </li> | + | <li> <em>Input</em> - What the part inputs to our system </li> |
- | <li> | + | <li> <em>Output</em> - What the part outputs from our system </li> |
- | <em>Part</em> - The ID of the part | + | <li> <em>Inducer</em> - What is required to induce expression of the part </li> |
- | </li> | + | <li> <em>Represser</em> - What will repress expression of the part </li> |
- | <li> | + | |
- | <em>Input</em> - What the part inputs to our system | + | |
- | </li> | + | |
- | <li> | + | |
- | <em>Output</em> - What the part outputs from our system | + | |
- | </li> | + | |
- | <li> | + | |
- | <em>Inducer</em> - What is required to induce expression of the part | + | |
- | </li> | + | |
- | <li> | + | |
- | <em>Represser</em> - What will repress expression of the part | + | |
- | </li> | + | |
</ul> | </ul> | ||
- | <p> | + | <p> EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct. </p> |
- | EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct. | + | <p style="margin:10px; font-size:18px"><b>Fitness Function</b></p> |
- | </p> | + | <p> Our system works by using a <em>fitness function</em> to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a fittness |
- | <p><b>Fitness Function</b></p> | + | function that is fexible enough to account for the desired properties: </p> |
- | <p> | + | <p> The frst of those is to create a function that will take all |
- | Our system works by using a <em>fitness function</em> to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a | + | desired behaviors as an input and will grant a better fitness |
- | function that is fexible enough to account for the desired properties: | + | score to a circuit based on those. That means that if a user |
- | </p> | + | inputs that he or she wants a circuit that produces molecules |
- | <p> | + | A, B and C and oscillates between them, the function will |
- | The frst of those is to create a function that will take all | + | assign points based on the existence of molecules A,B and |
- | desired behaviors as an input and will grant a better fitness | + | C and upon seeing oscillation. Of course, that requires the |
- | score to a circuit based on those. That means that if a user | + | behavior of oscillation to be defined in a way the system will |
- | inputs that he or she wants a circuit that produces molecules | + | be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes |
- | A, B and C and oscillates between them, the function will | + | a major drawback as the behaviors requested become more |
- | assign points based on the existence of molecules A,B and | + | and more complex since more and more complex coding is |
- | C and upon seeing oscillation. Of course, that requires the | + | required. However, a major advantage is that the evaluation |
- | behavior of oscillation to be defined in a way the system will | + | and selection of circuit individuals becomes fully automated |
- | be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes | + | and once the process is set there is no need for further user |
- | a major drawback as the behaviors requested become more | + | interference until the appropriate solution has been found. </p> |
- | and more complex since more and more complex coding is | + | <p> An alternative method would be to use a human being as |
- | required. However, a major advantage is that the evaluation | + | the fitness function. The person would be displayed with the |
- | and selection of circuit individuals becomes fully automated | + | data of each circuit after a certain number of iterations and |
- | and once the process is set there is no need for further user | + | he or she would choose individuals based on the data shown. |
- | interference until the appropriate solution has been found. | + | As sophisticated of a fitness function a human can be, there |
- | </p> | + | are several drawbacks that should not be overlooked. The |
- | <p> | + | first one is that such an approach would require consistent |
- | An alternative method would be to use a human being as | + | involvement of the user with the program. In that case, |
- | the fitness function. The person would be displayed with the | + | the only advantage gained from the software is the speed |
- | data of each circuit after a certain number of iterations and | + | at which it considers the behavior of each circuit and the |
- | he or she would choose individuals based on the data shown. | + | display format. Second, the user will only be able to consider |
- | As sophisticated of a | + | several individuals at every generation simply because of |
- | are several drawbacks that should not be overlooked. The | + | the amount of data associated with each circuit (molecules |
- | first one is that such an approach would require consistent | + | produced, their quantities over time, etc'). This makes the |
- | involvement of the user with the program. In that case, | + | population very small and the selective process less effcient. </p> |
- | the only advantage gained from the software is the speed | + | <p> Since the goal of this project is to bring EvoGEM to the |
- | at which it considers the behavior of each circuit and the | + | point where it can consider simple behaviors (synthesis of |
- | display format. Second, the user will only be able to consider | + | specific molecules) the first approach was chosen for developing our fitness fuction. </p> |
- | several individuals at every generation simply because of | + | <p style="margin:10px; font-size:18px"><b>Algorithm</b></p> |
- | the amount of data associated with each circuit (molecules | + | <p> The main algorithm that will evaluate the different circuits |
- | produced, their quantities over time, etc'). This makes the | + | will look at each circuit in its environment. The function |
- | population very small and the selective process less effcient. | + | would take into account two main components: Whether or |
- | </p> | + | not did the circuit produce the requested molecules, and the |
- | <p> | + | length of the circuit (the shorter, the better). The reason |
- | Since the goal of this project is to bring EvoGEM to the | + | for the second argument to be taken into account is since |
- | point where it can consider simple behaviors (synthesis of | + | the software is supposed to simulate circuits that are to suc- |
- | specific molecules) the first approach was chosen for developing our fitness fuction. | + | ceed in vivo. Longer DNA circuits have a lower success rate |
- | </p> | + | in being integrated into bacteria than shorter ones. There |
- | <p><b>Algorithm</b></p> | + | can be two possible ways to reward circuits according to |
- | <p> | + | their production of desired molecules: First, a reward pro- |
- | The main algorithm that will evaluate the different circuits | + | portional to the number of molecules can be given to the |
- | will look at each circuit in its environment. The function | + | circuit (e.g. 1 point/ molecule produced, so 100 molecules |
- | would take into account two main components: Whether or | + | give 100 points). A second approach would be to give a |
- | not did the circuit produce the requested molecules, and the | + | constant reward for each molecule type. In such a case, if |
- | length of the circuit (the shorter, the better). The reason | + | the user requests for two molecule types, A and B, and the |
- | for the second argument to be taken into account is since | + | circuit produced 50 units of molecule A and 150 units of |
- | the software is supposed to simulate circuits that are to suc- | + | molecule B, they both give that circuit a constant reward |
- | ceed in vivo. Longer DNA circuits have a lower success rate | + | (e.g. 100 points for each type). Which of these two methods |
- | in being integrated into bacteria than shorter ones. There | + | of rewards is superior to the other, if at all, will have to |
- | can be two possible ways to reward circuits according to | + | be determined experimentally. </p> |
- | their production of desired molecules: First, a reward pro- | + | <p> Another feature of the function to consider is whether or not |
- | portional to the number of molecules can be given to the | + | the selection of offspring for each consecutive generation is |
- | circuit (e.g. 1 point/ molecule produced, so 100 molecules | + | done in a greedy matter (e.g. always choose the best 2) or |
- | give 100 points). A second approach would be to give a | + | in a matter more similar to a roulette wheel (e.g. choose 2 |
- | constant reward for each molecule type. In such a case, if | + | with a higher chance of choosing the best ones and a lower |
- | the user requests for two molecule types, A and B, and the | + | chance of choosing the worst ones). The reason for such a |
- | circuit produced 50 units of molecule A and 150 units of | + | consideration is the fact that some molecules have a pathway |
- | molecule B, they both give that circuit a constant reward | + | that allows them to be synthesized. For example, suppose |
- | (e.g. 100 points for each type). Which of these two methods | + | we have a protein A that is a complex, i.e. it is composed |
- | of rewards is superior to the other, if at all, will have to | + | of several different sub-units of proteins.Suppose one needs |
- | be determined experimentally. | + | proteins B, C and D to combine and create A. That means |
- | </p> | + | we need a bio-brick the produces B, one that produces C and |
- | <p> | + | one that produces D for any chance to produce A. However, |
- | Another feature of the function to consider is whether or not | + | the circuit than becomes longer, and so loses points, before |
- | the selection of offspring for each consecutive generation is | + | the final product is obtained, and a reward is given for it. |
- | done in a greedy matter (e.g. always choose the best 2) or | + | In this case, if the system keeps choosing the best circuits, |
- | in a matter more similar to a roulette wheel (e.g. choose 2 | + | it could never utilize the unfinished circuits that lead to the |
- | with a higher chance of choosing the best ones and a lower | + | formation of molecule A. The greedy approach very quickly |
- | chance of choosing the worst ones). The reason for such a | + | fails once the problem becomes more complex. </p> |
- | consideration is the fact that some molecules have a pathway | + | |
- | that allows them to be synthesized. For example, suppose | + | |
- | we have a protein A that is a complex, i.e. it is composed | + | |
- | of several different sub-units of proteins.Suppose one needs | + | |
- | proteins B, C and D to combine and create A. That means | + | |
- | we need a bio-brick the produces B, one that produces C and | + | |
- | one that produces D for any chance to produce A. However, | + | |
- | the circuit than becomes longer, and so loses points, before | + | |
- | the final product is obtained, and a reward is given for it. | + | |
- | In this case, if the system keeps choosing the best circuits, | + | |
- | it could never utilize the unfinished circuits that lead to the | + | |
- | formation of molecule A. The greedy approach very quickly | + | |
- | fails once the problem becomes more complex. | + | |
- | </p> | + | |
</body> | </body> | ||
</html> | </html> |
Latest revision as of 07:07, 19 December 2007
Introduction to EvoGEM | Project Design | Final Results of EvoGEM |
EvoGEM was developed using C++ and a graphics agent engine known as Vigo::3D.is a library for multi-agent simulation and visualization in 3D space that was developed at the University of Calgary by Ian Burleigh. The code for Vigo::3D is open source at available at http://vigo.sourceforge.net/docs/dox/html/main.html. By using Vigo::3D the EvoGEM project has been able to generate clear graphical representations of the systems that are evolved. This gives a nice qualitative visual view of how the system works.
The Simulations
The image shows a still screen shot from one of the movies generated by EvoGEM. The large fairly transparent purple spheres represent RNA Polymerases. These polymerases are preprogrammed in with characteristics that define their ability to bind to the promoter, which is the darker green bar at the end of the main circut. The lighter green portion corresponds to a ribosome binding site, the purpleish pink component represents the protein coding region and lastly the red part at the end of the circut is a terminator. The definitions of the other spheres floating around represent various proteins and other factors whose function will be dependent on the type of simulation being run.
The parameters of the simulation are determined before running the simulation by setting the values in a configuration file. These files determine the nature of the simulation. The configuration files allow the user to define, among other tihngs: The number of generations. The target products the system should evolve. And the mutation rates of each generation
EvoGEM and The Registry
As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:
- Type - what type the part is. ie Promoter, RBS ect
- Part - The ID of the part
- Input - What the part inputs to our system
- Output - What the part outputs from our system
- Inducer - What is required to induce expression of the part
- Represser - What will repress expression of the part
EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct.
Fitness Function
Our system works by using a fitness function to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a fittness function that is fexible enough to account for the desired properties:
The frst of those is to create a function that will take all desired behaviors as an input and will grant a better fitness score to a circuit based on those. That means that if a user inputs that he or she wants a circuit that produces molecules A, B and C and oscillates between them, the function will assign points based on the existence of molecules A,B and C and upon seeing oscillation. Of course, that requires the behavior of oscillation to be defined in a way the system will be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes a major drawback as the behaviors requested become more and more complex since more and more complex coding is required. However, a major advantage is that the evaluation and selection of circuit individuals becomes fully automated and once the process is set there is no need for further user interference until the appropriate solution has been found.
An alternative method would be to use a human being as the fitness function. The person would be displayed with the data of each circuit after a certain number of iterations and he or she would choose individuals based on the data shown. As sophisticated of a fitness function a human can be, there are several drawbacks that should not be overlooked. The first one is that such an approach would require consistent involvement of the user with the program. In that case, the only advantage gained from the software is the speed at which it considers the behavior of each circuit and the display format. Second, the user will only be able to consider several individuals at every generation simply because of the amount of data associated with each circuit (molecules produced, their quantities over time, etc'). This makes the population very small and the selective process less effcient.
Since the goal of this project is to bring EvoGEM to the point where it can consider simple behaviors (synthesis of specific molecules) the first approach was chosen for developing our fitness fuction.
Algorithm
The main algorithm that will evaluate the different circuits will look at each circuit in its environment. The function would take into account two main components: Whether or not did the circuit produce the requested molecules, and the length of the circuit (the shorter, the better). The reason for the second argument to be taken into account is since the software is supposed to simulate circuits that are to suc- ceed in vivo. Longer DNA circuits have a lower success rate in being integrated into bacteria than shorter ones. There can be two possible ways to reward circuits according to their production of desired molecules: First, a reward pro- portional to the number of molecules can be given to the circuit (e.g. 1 point/ molecule produced, so 100 molecules give 100 points). A second approach would be to give a constant reward for each molecule type. In such a case, if the user requests for two molecule types, A and B, and the circuit produced 50 units of molecule A and 150 units of molecule B, they both give that circuit a constant reward (e.g. 100 points for each type). Which of these two methods of rewards is superior to the other, if at all, will have to be determined experimentally.
Another feature of the function to consider is whether or not the selection of offspring for each consecutive generation is done in a greedy matter (e.g. always choose the best 2) or in a matter more similar to a roulette wheel (e.g. choose 2 with a higher chance of choosing the best ones and a lower chance of choosing the worst ones). The reason for such a consideration is the fact that some molecules have a pathway that allows them to be synthesized. For example, suppose we have a protein A that is a complex, i.e. it is composed of several different sub-units of proteins.Suppose one needs proteins B, C and D to combine and create A. That means we need a bio-brick the produces B, one that produces C and one that produces D for any chance to produce A. However, the circuit than becomes longer, and so loses points, before the final product is obtained, and a reward is given for it. In this case, if the system keeps choosing the best circuits, it could never utilize the unfinished circuits that lead to the formation of molecule A. The greedy approach very quickly fails once the problem becomes more complex.