Variety show: plant genomes sequenced
The genomes of 18 different and varied strains of the thale cress, Arabidopsis thaliana, have been sequenced by an international group lead by Oxford University scientists.
Arabidopsis is standard in plant genetics labs in the same way that other scientists might study E. coli, yeast and fruit flies as models from which they can draw general lessons about the way genes and biological pathways work. And the genome of the thale cress was decoded in 2000 to act as a reference for studies in plant genetics.
Oxford Science Blog asked lead researcher Professor Richard Mott of the Wellcome Trust Centre for Human Genetics about the current study providing 18 new genomes, and what it offers the field.
The research is published in Nature and also included Oxford scientists from the departments of Plant Sciences and Statistics.
Oxford Science Blog: Why is Arabidopsis so important in understanding plant genetics?
Richard Mott: Arabidopsis has become the standard model for much of plant genetics research.
It is small and grows quickly, it has an accurately sequenced reference genome that is relatively compact and there is a wealth of molecular tools with which to probe gene function.
Arabidopsis is a brassica – that is, a member of the cabbage family. But most of its genes are similar to those found in other plants, including important crops. It is generally much easier to figure out the functions of genes in Arabidopsis and apply this knowledge to other species.
OSB: I thought its genome had been sequenced already. What does this new study add?
RM: Arabidopsis is a highly variable species, at both the genetic and phenotypic [observable characteristic] level.
Several recent studies have begun to catalogue this genetic variation. Our study differs in that rather than interpret this variability in relation to the reference genome sequence (called Col-0), we have assembled 18 Arabidopsis genomes very accurately, so that we could determine the gene content of each.
What we found was quite surprising. If we had simply lifted over the genes annotated in the reference Col-0 onto each genome, then we would have predicted that about a third of the genes were severely altered (or even non-functional) in at least one of the 18 genomes.
But because we also collected gene expression data (essentially the sequences of the protein-coding genes), we could see that in many cases the gene structures changed in a way that mitigated these effects.
This means that we need to move from a view of Arabidopsis where we interpret the effects of variation relative to the reference, to one where each genome is treated on an equal footing. It will be interesting to see if this also applies to other species.
OSB: What has been learned about the genetic variation between different strains of this species?
RM: Along with several other recent studies, we found there is a lot of variation in this 119 Mb genome, not only single-letter changes in the DNA code (about 3 million) but also many insertions and deletions (over 1 million).
We also found about 100,000 ‘imbalanced substitutions’, where a stretch of reference genome was replaced with an entirely difference sequence of a different length. Only about 7% of genes were completely conserved between the genomes.
OSB: What does this tell us?
RM: One important reason for studying these particular 18 genomes is that they are the progenitors of a much larger population of over 700 inbred lines, called ‘MAGIC’. (MAGIC stands for Multi-parental Advanced Generation InterCross).
The MAGIC lines are being used in a number of labs around the world to study a wide range of phenotypes, such as growth and disease resistance.
Each MAGIC genome is a mosaic of the genomes we sequenced, so by stitching together these genomes in the right way, we can predict the genome sequences of a much larger population. In effect we have sequenced the genomes of all these lines for the price of sequencing 18.
OSB: Are the findings relevant for other plants?
RM: Arabidopsis is primarily used to understand fundamental mechanisms in plants. This includes the response to the environment. For example, the most variable genes in our study are those whose function relates to response to the biotic environment – disease-resistance genes and so on. This is going to be relevant to studies on disease resistance in the MAGIC lines.
It is expected that lessons learned in Arabidopsis will translate to crops. In fact, there similar populations of MAGIC lines being made in crops such as wheat. But the wheat genome is about 80 times larger than the Arabidopsis genome and much harder to assemble, so the work we have done here may inform studies in these other populations.