Make your own genes - A Combinatorial Genomics approach to reclaim 'junk' DNA
We asked a simple question - Why nature chose a particular region for expression ? What if we moved signals elsewhere ? In other words, can we make our own genes from non-coding DNA. If yes, what is the boundary condition of such an approach and what are the best case scenarios ?
To address these questions, six E.coli intergenic regions with no history of transcription were randomly selected. All the sequences were computationally translated and matched against the non redundant NCBI database to ensure that we did not end up creating a known natural equivalent. Sequences were amplified and inserted into pBAD topo vector and expressed in E.coli MG 1655 cells. Protein expression was confirmed through Western blotting. The intracellular expression of one of the proteins resulted in the cell growth inhibition. The growth inhibition was completely rescued by culturing cells in the inducer-free medium. Computational structure prediction suggests globular tertiary structure for two of the six non-natural proteins synthesized. We called these artificially constructed genes EKA (ekam - first in sanskrit). Here is the link to the paper
Having provided the first proof of the concept, the next step was to perform a genome wide scan of intergenic regions and predict what would happen if we expressed a non-coding DNA artificially. To meet this objective, we are currently developing a web based knowledgebase called EKA. Using EKA Knowledgebase one can find length, sequence composition, structure, function and cellular localization attributes of non-coding DNA, if expressed.
The second offshoot of this work was to revisit an old fundamental question - what makes a gene ? In this past a large number of studies have addressed this question leading to the identification of key sequence based signals flanking coding regions. Our strategy is based on conformational, thermodynamic and physio-chemical parameters. We hope to find features that are common to a broad set of coding regions. Once patterns are identified, our questions will move towards understanding of the emergence of genes from evolutionary perspective, and 'extract' gene-like regions from non-coding DNA regions. As a first step towards this goal, we trained support vector machine (SVM), with 1500 known coding regions and tested against known 200 non coding DNA sequences. Preliminary results point to the new classification system that may emerge from these studies.
The third offshoot of this work is to design artificial antimicrobial and anticancer peptides from existing non-coding sequences of prokaryotes and eukaryotes. We have found interesting hits towards these endpoints from E.coli intergenic regions. The fourth offshoot is to express introns, pseudogenes and repetitive sequences, in different eukaryotes and see what happens. The fifth offshoot is to design micro RNA molecules towards novel applications. The sixth offshoot (a much bigger challenge) is to design a novel pathway based on proteins made from junk DNA.
We believe we are looking at the emergence of a new area that we call COMBINATORIAL GENOMICS.
updated Dec. 11, 2011