Calgary/evoGEM projectDesign
From 2007.igem.org
Introduction to EvoGEM | Project Design | Final Results of EvoGEM |
EvoGEM was developed using C++ and a graphics agent engine known as Vigo::3D.is a library for multi-agent simulation and visualization in 3D space that was developed at the University of Calgary by Ian Burleigh. The code for Vigo::3D is open source at available at http://vigo.sourceforge.net/docs/dox/html/main.html. By using Vigo::3D the EvoGEM project has been able to generate clear graphical representations of the systems that are evolved. This gives a nice qualitative visual view of how the system works.
The Simulations
The image shows a still screen shot from one of the movies generated by EvoGEM. The large fairly transparent purple spheres represent RNA Polymerases. These polymerases are preprogrammed in with characteristics that define their ability to bind to the promoter, which is the darker green bar at the end of the main circut. The lighter green portion corresponds to a ribosome binding site, the purpleish pink component represents the protein coding region and lastly the red part at the end of the circut is a terminator. The definitions of the other spheres floating around represent various proteins and other factors whose function will be dependent on the type of simulation being run.
The parameters of the simulation are determined before running the simulation by setting the values in a configuration file. These files determine the nature of the simulation. The configuration files allow the user to define, among other tihngs:
- The number of generations
- The target products the system should evolve
- The mutation rates of each generation
EvoGEM and The Registry
As of yet EvoGEM lacks the ability to connect with iGEM registry and dynamically search for parts. As such our system works off of a "mini registry" file that contains approximately 200 parts. Parts from the registry are described in a way that is easily understandable by humans but very hard for a computer system to work with. Therefore the parts in our mini registry have been manually composed by reading through the part descriptions, determining which parts of the description have value to the simulation and then defining them in fields that our simulation can interpret. The result was the development of several descriptive fields which our system will recognize values for and then make the appropriate selections. Some of our fields are:
- Type - what type the part is. ie Promoter, RBS ect
- Part - The ID of the part
- Input - What the part inputs to our system
- Output - What the part outputs from our system
- Inducer - What is required to induce expression of the part
- Represser - What will repress expression of the part
EvoGEM uses these values to assess potential parts for the evolution of the desired ciruct.
Fitness Function
Our system works by using a fitness function to assess whether or not an evolved ciruct is "good". There are two possible solutions that can create a �tness function that is fexible enough to account for the desired properties:
The frst of those is to create a function that will take all desired behaviors as an input and will grant a better fitness score to a circuit based on those. That means that if a user inputs that he or she wants a circuit that produces molecules A, B and C and oscillates between them, the function will assign points based on the existence of molecules A,B and C and upon seeing oscillation. Of course, that requires the behavior of oscillation to be defined in a way the system will be able to identify it and quantify different sorts of oscillations (based on amplitude, wavelength, etc'). This becomes a major drawback as the behaviors requested become more and more complex since more and more complex coding is required. However, a major advantage is that the evaluation and selection of circuit individuals becomes fully automated and once the process is set there is no need for further user interference until the appropriate solution has been found.
An alternative method would be to use a human being as the fitness function. The person would be displayed with the data of each circuit after a certain number of iterations and he or she would choose individuals based on the data shown. As sophisticated of a �tness function a human can be, there are several drawbacks that should not be overlooked. The first one is that such an approach would require consistent involvement of the user with the program. In that case, the only advantage gained from the software is the speed at which it considers the behavior of each circuit and the display format. Second, the user will only be able to consider several individuals at every generation simply because of the amount of data associated with each circuit (molecules produced, their quantities over time, etc'). This makes the population very small and the selective process less effcient.
Since the goal of this project is to bring EvoGEM to the point where it can consider simple behaviors (synthesis of specific molecules) the first approach was chosen for developing our fitness fuction.
Algorithm
The main algorithm that will evaluate the different circuits will look at each circuit in its environment. The function would take into account two main components: Whether or not did the circuit produce the requested molecules, and the length of the circuit (the shorter, the better). The reason for the second argument to be taken into account is since the software is supposed to simulate circuits that are to suc- ceed in vivo. Longer DNA circuits have a lower success rate in being integrated into bacteria than shorter ones. There can be two possible ways to reward circuits according to their production of desired molecules: First, a reward pro- portional to the number of molecules can be given to the circuit (e.g. 1 point/ molecule produced, so 100 molecules give 100 points). A second approach would be to give a constant reward for each molecule type. In such a case, if the user requests for two molecule types, A and B, and the circuit produced 50 units of molecule A and 150 units of molecule B, they both give that circuit a constant reward (e.g. 100 points for each type). Which of these two methods of rewards is superior to the other, if at all, will have to be determined experimentally.
Another feature of the function to consider is whether or not the selection of offspring for each consecutive generation is done in a greedy matter (e.g. always choose the best 2) or in a matter more similar to a roulette wheel (e.g. choose 2 with a higher chance of choosing the best ones and a lower chance of choosing the worst ones). The reason for such a consideration is the fact that some molecules have a pathway that allows them to be synthesized. For example, suppose we have a protein A that is a complex, i.e. it is composed of several different sub-units of proteins.Suppose one needs proteins B, C and D to combine and create A. That means we need a bio-brick the produces B, one that produces C and one that produces D for any chance to produce A. However, the circuit than becomes longer, and so loses points, before the final product is obtained, and a reward is given for it. In this case, if the system keeps choosing the best circuits, it could never utilize the unfinished circuits that lead to the formation of molecule A. The greedy approach very quickly fails once the problem becomes more complex.