Calgary/evoGEM projectDesign

From 2007.igem.org

< Calgary(Difference between revisions)
 
Line 1: Line 1:
<html>
<html>
 +
<!--
 +
  evoGEM Project Design
 +
  This page describes the design of the evoGEM project
 +
  All information on this page was adapted from Boris Shabash, the creator of evoGEM
 +
-->
<head>
<head>
 +
<!--
 +
  Cascading Style Sheet (CSS) component
 +
  in plain html style sheets are completely independent of the HTML page but are included in this page to simplify design within a wiki script environment.
 +
  This section describes the layout, colors and images displayed on the page
 +
-->
<style>
<style>
Line 65: Line 75:
</head>
</head>
<body>
<body>
 +
 +
<!--
 +
  evoGEM Title
 +
  The title and and top level navigation links
 +
-->
<table class="header">
<table class="header">
   <tr>
   <tr>
Line 72: Line 87:
   </tr>
   </tr>
</table>
</table>
 +
 +
<!--
 +
  evoGEM main menu
 +
  The navigation menu for the evoGEM section of the site
 +
-->
<table class="links" >
<table class="links" >
   <tr>
   <tr>
Line 78: Line 98:
     <td align="center" ><a class="mainLinks" href="https://2007.igem.org/Calgary/evoGEM_results" title="Results" >Final Results of EvoGEM</a> </td>
     <td align="center" ><a class="mainLinks" href="https://2007.igem.org/Calgary/evoGEM_results" title="Results" >Final Results of EvoGEM</a> </td>
</table>
</table>
-
<p>
 
-
EvoGEM was developed using C++ and a graphics agent engine known as Vigo::3D.is a library for multi-agent simulation and visualization in 3D space that was developed at the University of Calgary by Ian Burleigh. The code for Vigo::3D is open source at available at <a href="http://vigo.sourceforge.net/docs/dox/html/main.html" tilte="vigo code">http://vigo.sourceforge.net/docs/dox/html/main.html</a>. By using Vigo::3D the EvoGEM project has been able to generate clear graphical representations of the systems that are evolved. This gives a nice qualitative visual view of how the system works.
 
-
</p>
 
-
<div style="float:left; border:#CCCCCC thin outset; padding:2px; margin:18px;">
 
-
<img src="https://static.igem.org/mediawiki/2007/f/f3/EvoGemSim3.gif" alt="EvoGEM simulation results" />
 
-
</div>
 
-
<p style="margin:10px; font-size:18px"><b>The Simulations</b></p>
 
-
<p>
 
-
The image shows a still screen shot from one of the movies generated by EvoGEM. The large fairly transparent purple spheres represent RNA Polymerases. These polymerases are preprogrammed in with characteristics that define their ability to bind to the promoter, which is the darker green bar at the end of the main circut. The lighter green portion corresponds to a ribosome binding site, the purpleish pink component represents the protein coding region and lastly the red part at the end of the circut is a terminator. The definitions of the other spheres floating around represent various proteins and other factors whose function will be dependent on the type of simulation being run.
 
-
</p>
 
-
<p>
 
-
The parameters of the simulation are determined before running the simulation by setting the values in a <em>configuration file</em>. These files determine the nature of the simulation. The configuration files allow the user to define, among other tihngs: The number of generations. The target products the system should evolve. And the mutation rates of each generation
 
-
</li>
 
-
</ul>
 
 +
<!--
 +
  The following paragraphs introduction some of the background behind evoGEM
 +
  Introduction to evoGEM and Vigo::3D
 +
-->
 +
<p> EvoGEM was developed using C++ and a graphics agent engine known as Vigo::3D.is a library for multi-agent simulation and visualization in 3D space that was developed at the University of Calgary by Ian Burleigh. The code for Vigo::3D is open source at available at <a href="http://vigo.sourceforge.net/docs/dox/html/main.html" tilte="vigo code">http://vigo.sourceforge.net/docs/dox/html/main.html</a>. By using Vigo::3D the EvoGEM project has been able to generate clear graphical representations of the systems that are evolved. This gives a nice qualitative visual view of how the system works. </p>
 +
<div style="float:left; border:#CCCCCC thin outset; padding:2px; margin:18px;"> <img src="https://static.igem.org/mediawiki/2007/f/f3/EvoGemSim3.gif" alt="EvoGEM simulation results" /> </div>
 +
<p style="margin:10px; font-size:18px"><b>The Simulations</b></p>
 +
<p> The image shows a still screen shot from one of the movies generated by EvoGEM. The large fairly transparent purple spheres represent RNA Polymerases. These polymerases are preprogrammed in with characteristics that define their ability to bind to the promoter, which is the darker green bar at the end of the main circut. The lighter green portion corresponds to a ribosome binding site, the purpleish pink component represents the protein coding region and lastly the red part at the end of the circut is a terminator. The definitions of the other spheres floating around represent various proteins and other factors whose function will be dependent on the type of simulation being run. </p>
 +
<p> The parameters of the simulation are determined before running the simulation by setting the values in a <em>configuration file</em>. These files determine the nature of the simulation. The configuration files allow the user to define, among other tihngs: The number of generations. The target products the system should evolve. And the mutation rates of each generation
 +
  </li>
 +
  </ul>
<p style="margin:10px; font-size:18px"><b>EvoGEM and The Registry</b></p>
<p style="margin:10px; font-size:18px"><b>EvoGEM and The Registry</b></p>
<p>As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:</p>
<p>As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:</p>
<ul>
<ul>
-
<li>
+
  <li> <em>Type</em> - what type the part is. ie Promoter, RBS ect </li>
-
<em>Type</em> - what type the part is. ie Promoter, RBS ect
+
  <li> <em>Part</em> - The ID of the part </li>
-
</li>
+
  <li> <em>Input</em> - What the part inputs to our system </li>
-
<li>
+
  <li> <em>Output</em> - What the part outputs from our system </li>
-
<em>Part</em> - The ID of the part
+
  <li> <em>Inducer</em> - What is required to induce expression of the part </li>
-
</li>
+
  <li> <em>Represser</em> - What will repress expression of the part </li>
-
<li>
+
-
<em>Input</em> - What the part inputs to our system
+
-
</li>
+
-
<li>
+
-
<em>Output</em> - What the part outputs from our system
+
-
</li>
+
-
<li>
+
-
<em>Inducer</em> - What is required to induce expression of the part
+
-
</li>
+
-
<li>
+
-
<em>Represser</em> - What will repress expression of the part
+
-
</li>
+
</ul>
</ul>
-
<p>
+
<p> EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct. </p>
-
EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct.
+
-
</p>
+
<p style="margin:10px; font-size:18px"><b>Fitness Function</b></p>
<p style="margin:10px; font-size:18px"><b>Fitness Function</b></p>
-
<p>
+
<p> Our system works by using a <em>fitness function</em> to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a fittness
-
Our system works by using a <em>fitness function</em> to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a fittness
+
  function that is fexible enough to account for the desired properties: </p>
-
function that is fexible enough to account for the desired properties:
+
<p> The frst of those is to create a function that will take all
-
</p>
+
  desired behaviors as an input and will grant a better fitness
-
<p>
+
  score to a circuit based on those. That means that if a user
-
The frst of those is to create a function that will take all
+
  inputs that he or she wants a circuit that produces molecules
-
desired behaviors as an input and will grant a better fitness
+
  A, B and C and oscillates between them, the function will
-
score to a circuit based on those. That means that if a user
+
  assign points based on the existence of molecules A,B and
-
inputs that he or she wants a circuit that produces molecules
+
  C and upon seeing oscillation. Of course, that requires the
-
A, B and C and oscillates between them, the function will
+
  behavior of oscillation to be defined in a way the system will
-
assign points based on the existence of molecules A,B and
+
  be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes
-
C and upon seeing oscillation. Of course, that requires the
+
  a major drawback as the behaviors requested become more
-
behavior of oscillation to be defined in a way the system will
+
  and more complex since more and more complex coding is
-
be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes
+
  required. However, a major advantage is that the evaluation
-
a major drawback as the behaviors requested become more
+
  and selection of circuit individuals becomes fully automated
-
and more complex since more and more complex coding is
+
  and once the process is set there is no need for further user
-
required. However, a major advantage is that the evaluation
+
  interference until the appropriate solution has been found. </p>
-
and selection of circuit individuals becomes fully automated
+
<p> An alternative method would be to use a human being as
-
and once the process is set there is no need for further user
+
  the fitness function. The person would be displayed with the
-
interference until the appropriate solution has been found.
+
  data of each circuit after a certain number of iterations and
-
</p>
+
  he or she would choose individuals based on the data shown.
-
<p>
+
  As sophisticated of a fitness function a human can be, there
-
An alternative method would be to use a human being as
+
  are several drawbacks that should not be overlooked. The
-
the fitness function. The person would be displayed with the
+
  first one is that such an approach would require consistent
-
data of each circuit after a certain number of iterations and
+
  involvement of the user with the program. In that case,
-
he or she would choose individuals based on the data shown.
+
  the only advantage gained from the software is the speed
-
As sophisticated of a fitness function a human can be, there
+
  at which it considers the behavior of each circuit and the
-
are several drawbacks that should not be overlooked. The
+
  display format. Second, the user will only be able to consider
-
first one is that such an approach would require consistent
+
  several individuals at every generation simply because of
-
involvement of the user with the program. In that case,
+
  the amount of data associated with each circuit (molecules
-
the only advantage gained from the software is the speed
+
  produced, their quantities over time, etc'). This makes the
-
at which it considers the behavior of each circuit and the
+
  population very small and the selective process less effcient. </p>
-
display format. Second, the user will only be able to consider
+
<p> Since the goal of this project is to bring EvoGEM to the
-
several individuals at every generation simply because of
+
  point where it can consider simple behaviors (synthesis of
-
the amount of data associated with each circuit (molecules
+
  specific molecules) the first approach was chosen for developing our fitness fuction. </p>
-
produced, their quantities over time, etc'). This makes the
+
-
population very small and the selective process less effcient.
+
-
</p>
+
-
<p>
+
-
Since the goal of this project is to bring EvoGEM to the
+
-
point where it can consider simple behaviors (synthesis of
+
-
specific molecules) the first approach was chosen for developing our fitness fuction.
+
-
</p>
+
<p style="margin:10px; font-size:18px"><b>Algorithm</b></p>
<p style="margin:10px; font-size:18px"><b>Algorithm</b></p>
-
<p>
+
<p> The main algorithm that will evaluate the different circuits
-
The main algorithm that will evaluate the different circuits
+
  will look at each circuit in its environment. The function
-
will look at each circuit in its environment. The function
+
  would take into account two main components: Whether or
-
would take into account two main components: Whether or
+
  not did the circuit produce the requested molecules, and the
-
not did the circuit produce the requested molecules, and the
+
  length of the circuit (the shorter, the better). The reason
-
length of the circuit (the shorter, the better). The reason
+
  for the second argument to be taken into account is since
-
for the second argument to be taken into account is since
+
  the software is supposed to simulate circuits that are to suc-
-
the software is supposed to simulate circuits that are to suc-
+
  ceed in vivo. Longer DNA circuits have a lower success rate
-
ceed in vivo. Longer DNA circuits have a lower success rate
+
  in being integrated into bacteria than shorter ones. There
-
in being integrated into bacteria than shorter ones. There
+
  can be two possible ways to reward circuits according to
-
can be two possible ways to reward circuits according to
+
  their production of desired molecules: First, a reward pro-
-
their production of desired molecules: First, a reward pro-
+
  portional to the number of molecules can be given to the
-
portional to the number of molecules can be given to the
+
  circuit (e.g. 1 point/ molecule produced, so 100 molecules
-
circuit (e.g. 1 point/ molecule produced, so 100 molecules
+
  give 100 points). A second approach would be to give a
-
give 100 points). A second approach would be to give a
+
  constant reward for each molecule type. In such a case, if
-
constant reward for each molecule type. In such a case, if
+
  the user requests for two molecule types, A and B, and the
-
the user requests for two molecule types, A and B, and the
+
  circuit produced 50 units of molecule A and 150 units of
-
circuit produced 50 units of molecule A and 150 units of
+
  molecule B, they both give that circuit a constant reward
-
molecule B, they both give that circuit a constant reward
+
  (e.g. 100 points for each type). Which of these two methods
-
(e.g. 100 points for each type). Which of these two methods
+
  of rewards is superior to the other, if at all, will have to
-
of rewards is superior to the other, if at all, will have to
+
  be determined experimentally. </p>
-
be determined experimentally.
+
<p> Another feature of the function to consider is whether or not
-
</p>
+
  the selection of offspring for each consecutive generation is
-
<p>
+
  done in a greedy matter (e.g. always choose the best 2) or
-
Another feature of the function to consider is whether or not
+
  in a matter more similar to a roulette wheel (e.g. choose 2
-
the selection of offspring for each consecutive generation is
+
  with a higher chance of choosing the best ones and a lower
-
done in a greedy matter (e.g. always choose the best 2) or
+
  chance of choosing the worst ones). The reason for such a
-
in a matter more similar to a roulette wheel (e.g. choose 2
+
  consideration is the fact that some molecules have a pathway
-
with a higher chance of choosing the best ones and a lower
+
  that allows them to be synthesized. For example, suppose
-
chance of choosing the worst ones). The reason for such a
+
  we have a protein A that is a complex, i.e. it is composed
-
consideration is the fact that some molecules have a pathway
+
  of several different sub-units of proteins.Suppose one needs
-
that allows them to be synthesized. For example, suppose
+
  proteins B, C and D to combine and create A. That means
-
we have a protein A that is a complex, i.e. it is composed
+
  we need a bio-brick the produces B, one that produces C and
-
of several different sub-units of proteins.Suppose one needs
+
  one that produces D for any chance to produce A. However,
-
proteins B, C and D to combine and create A. That means
+
  the circuit than becomes longer, and so loses points, before
-
we need a bio-brick the produces B, one that produces C and
+
  the final product is obtained, and a reward is given for it.
-
one that produces D for any chance to produce A. However,
+
  In this case, if the system keeps choosing the best circuits,
-
the circuit than becomes longer, and so loses points, before
+
  it could never utilize the unfinished circuits that lead to the
-
the final product is obtained, and a reward is given for it.
+
  formation of molecule A. The greedy approach very quickly
-
In this case, if the system keeps choosing the best circuits,
+
  fails once the problem becomes more complex. </p>
-
it could never utilize the unfinished circuits that lead to the
+
-
formation of molecule A. The greedy approach very quickly
+
-
fails once the problem becomes more complex.
+
-
</p>
+
</body>
</body>
</html>
</html>

Latest revision as of 07:07, 19 December 2007

back to U of C Homepage Check out evoGEM

EvoGEM was developed using C++ and a graphics agent engine known as Vigo::3D.is a library for multi-agent simulation and visualization in 3D space that was developed at the University of Calgary by Ian Burleigh. The code for Vigo::3D is open source at available at http://vigo.sourceforge.net/docs/dox/html/main.html. By using Vigo::3D the EvoGEM project has been able to generate clear graphical representations of the systems that are evolved. This gives a nice qualitative visual view of how the system works.

EvoGEM simulation results

The Simulations

The image shows a still screen shot from one of the movies generated by EvoGEM. The large fairly transparent purple spheres represent RNA Polymerases. These polymerases are preprogrammed in with characteristics that define their ability to bind to the promoter, which is the darker green bar at the end of the main circut. The lighter green portion corresponds to a ribosome binding site, the purpleish pink component represents the protein coding region and lastly the red part at the end of the circut is a terminator. The definitions of the other spheres floating around represent various proteins and other factors whose function will be dependent on the type of simulation being run.

The parameters of the simulation are determined before running the simulation by setting the values in a configuration file. These files determine the nature of the simulation. The configuration files allow the user to define, among other tihngs: The number of generations. The target products the system should evolve. And the mutation rates of each generation

EvoGEM and The Registry

As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:

  • Type - what type the part is. ie Promoter, RBS ect
  • Part - The ID of the part
  • Input - What the part inputs to our system
  • Output - What the part outputs from our system
  • Inducer - What is required to induce expression of the part
  • Represser - What will repress expression of the part

EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct.

Fitness Function

Our system works by using a fitness function to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a fittness function that is fexible enough to account for the desired properties:

The frst of those is to create a function that will take all desired behaviors as an input and will grant a better fitness score to a circuit based on those. That means that if a user inputs that he or she wants a circuit that produces molecules A, B and C and oscillates between them, the function will assign points based on the existence of molecules A,B and C and upon seeing oscillation. Of course, that requires the behavior of oscillation to be defined in a way the system will be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes a major drawback as the behaviors requested become more and more complex since more and more complex coding is required. However, a major advantage is that the evaluation and selection of circuit individuals becomes fully automated and once the process is set there is no need for further user interference until the appropriate solution has been found.

An alternative method would be to use a human being as the fitness function. The person would be displayed with the data of each circuit after a certain number of iterations and he or she would choose individuals based on the data shown. As sophisticated of a fitness function a human can be, there are several drawbacks that should not be overlooked. The first one is that such an approach would require consistent involvement of the user with the program. In that case, the only advantage gained from the software is the speed at which it considers the behavior of each circuit and the display format. Second, the user will only be able to consider several individuals at every generation simply because of the amount of data associated with each circuit (molecules produced, their quantities over time, etc'). This makes the population very small and the selective process less effcient.

Since the goal of this project is to bring EvoGEM to the point where it can consider simple behaviors (synthesis of specific molecules) the first approach was chosen for developing our fitness fuction.

Algorithm

The main algorithm that will evaluate the different circuits will look at each circuit in its environment. The function would take into account two main components: Whether or not did the circuit produce the requested molecules, and the length of the circuit (the shorter, the better). The reason for the second argument to be taken into account is since the software is supposed to simulate circuits that are to suc- ceed in vivo. Longer DNA circuits have a lower success rate in being integrated into bacteria than shorter ones. There can be two possible ways to reward circuits according to their production of desired molecules: First, a reward pro- portional to the number of molecules can be given to the circuit (e.g. 1 point/ molecule produced, so 100 molecules give 100 points). A second approach would be to give a constant reward for each molecule type. In such a case, if the user requests for two molecule types, A and B, and the circuit produced 50 units of molecule A and 150 units of molecule B, they both give that circuit a constant reward (e.g. 100 points for each type). Which of these two methods of rewards is superior to the other, if at all, will have to be determined experimentally.

Another feature of the function to consider is whether or not the selection of offspring for each consecutive generation is done in a greedy matter (e.g. always choose the best 2) or in a matter more similar to a roulette wheel (e.g. choose 2 with a higher chance of choosing the best ones and a lower chance of choosing the worst ones). The reason for such a consideration is the fact that some molecules have a pathway that allows them to be synthesized. For example, suppose we have a protein A that is a complex, i.e. it is composed of several different sub-units of proteins.Suppose one needs proteins B, C and D to combine and create A. That means we need a bio-brick the produces B, one that produces C and one that produces D for any chance to produce A. However, the circuit than becomes longer, and so loses points, before the final product is obtained, and a reward is given for it. In this case, if the system keeps choosing the best circuits, it could never utilize the unfinished circuits that lead to the formation of molecule A. The greedy approach very quickly fails once the problem becomes more complex.