Leveraging model legume information to find candidate genes for soybean Sudden Death Syndrome using the Legume Information System (lis)

tarix	25.06.2016
ölçüsü	95.5 Kb.

Leveraging model legume information to find candidate genes for soybean Sudden Death Syndrome using the Legume Information System (LIS)
Michael D. Gonzales, Kamal Gajendran, Andrew D. Farmer, Eric Archuleta, *William D. Beavis
National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505 USA
Michael D. Gonzales

Software Engineer, Programmer/Analyst

NCGR
Kamal Gajendran

Software Engineering Manager

NCGR
Andrew D. Farmer

Principal Software Engineer

NCGR
Eric Archuleta

Senior Software Engineer

NCGR
*Corresponding author:

William D. Beavis

Chief Scientific Officer

wdb@ncgr.org

Phone: 1 505 995 4412

Fax: 1 505 995 4432

i. Abstract

Comparative genomics is an emerging and powerful approach to achieve crop improvement. Using comparative genomics, information from model plant species can accelerate the discovery of genes responsible for disease and pest resistance, tolerance to plant stresses such as drought, and enhanced nutritional value including production of anti-oxidants and anti-cancer compounds. We demonstrate here how to use the Legume Information System (LIS) for a comparative genomics study, leveraging genomic information from Medicago truncatula (barrel medic), the model legume, to find candidate genes involved with sudden death syndrome (SDS) in Glycine max (soybean). Specifically, genetic maps, physical maps, and annotated tentative consensus and EST sequences from G. max and M. truncatula can be compared. In addition, the recently published M. truncatula genomic sequences can be used to identify M. truncatula candidate genes in a genomic region syntenic to a QTL region for SDS in soybean. Genomic sequences of candidate genes from M. truncatula can then be used to identify ESTs with sequence similarities from soybean for primer design and cloning of potential soybean disease causing alleles.

ii. Keywords

Legume Information System (LIS), comparative genomics, sudden death syndrome (SDS), quantitative trait loci (QTL), genetic maps, physical maps, linkage maps, model legumes, synteny, candidate genes

1.0 Introduction

A fully sequenced and annotated genome accelerates the identification of candidate genetic loci underlying phenotypes of interest. But, what can be done if the genome from a species of interest is not sequenced, nor is it likely to be sequenced in the near future? Because sequence and function of genes are largely conserved among related species, comparative genomics can leverage information and knowledge gained from a sequenced model, or reference species, to make hypotheses about the relationship between genotype and phenotype for related species.

Legumes (soybeans, dry beans, peas, alfalfa, peanuts, etc.) are important sources of proteins, oils, anti-oxidants, anti-cancer compounds, and provide organic sources of nitrogen fertilizer. Unfortunately, most crop legumes have not been sequenced because their large and complex polyploid genomes make genome sequencing cost prohibitive. Fortunately, several legume species including Medicago truncatula (barrel medic), Lotus japonicus (Japanese lotus) and Phaseolus vulgaris (dry beans) have relatively small and tractable genomes. A genome sequencing project (http://www.medicago.org), supported by the Noble Foundation and NSF-PGRP, has produced annotated genomic sequence for most of M. truncatula gene space. Thus, in regions of the genome where syntenic relationships exist between barrel medic and a crop legume, the annotated genomic sequence from barrel medic can be leveraged to nominate or identify candidate genes of interest in other legumes.

As an example we will use the Legume Information System (LIS; http://www.comparative-legumes.org), a publicly accessible, clade information resource that integrates genetic and molecular data from multiple legume species to conduct cross-species genomic and transcript comparisons and identify candidate genes. (1) Our goal for this tutorial is to find candidate genes for Sudden Death Syndrome (SDS) in soybean. SDS, caused by Fusarium solani f. sp. glycines, creates toxins in the roots resulting in root rot and leaf scorch, severely reducing soybean production each year. SDS is a major concern and has become the focus for breeders and scientists interested in producing a more resistant soybean plant. Quantitative trait loci (QTL) for SDS in soybean have been previously identified and mapped. (2) The goal of QTL studies is to identify genomic regions that are statistically associated with variation in complex quantitative traits such as SDS resistance. Once QTL regions have been located, the actual genetic elements responsible for the phenotype can perhaps be identified.

To describe our approach for finding candidate genes, we will begin by using the CMap module of LIS to query and display soybean SDS QTLs on genetic maps. Next, the soybean linkage maps are compared to M. truncatula maps to identify syntenic regions containing SDS QTLs. Once genomic markers in M. truncatula have been identified as syntenic to the SDS QTL region, we utilize the M. truncatula physical maps to identify the sequenced genomic clones in the same regions. These genomic sequences within the physical region are then analyzed for candidate genes using annotations displayed in the LIS Comparative Functional Genomics Browser (CFGB). Finally, consensus sequences aligned to genomic sequence can be analyzed using the existing annotations to isolate soybean EST sequences, follow on primer design, or further analysis.

2.0 Materials

Clade-oriented web-based information resources, such as LIS, offer both data and applications. Database content can consist of raw experimental data, but also often consists of information resulting from preliminary analyses, such as computationally generated sequence annotation and results of QTL analyses. Available applications are generally tools for further analysis and visualization of data and information. All procedures described here will use the Legume Information System website available at http://www.comparative-legumes.org. A high speed connection is recommended. The site has been optimized to work using Netscape 7.x and Internet Explorer 5.x for Windows as well as Netscape 7.x for Macintosh. This tutorial was based on analysis done in October 2005. Annotated data available in LIS was last updated in fall of 2005 using the XGI pipeline (v 2.0). As LIS is updated with new data, results displayed as part of this tutorial may change.

2.1 Legume Information System Overview

LIS integrates map, genomic and transcript data from a number of databases and allows researchers to access and compare data through a single, but multifaceted, web interface. The LIS database content and applications that we will use includes the XGI transcript and genomic databases, CMap and SoyBase. All publicly available transcript and genomic data from Medicago truncatula, Lotus japonicus, Glycine max and Arabidopsis thaliana have been analyzed by a variety of computational annotation algorithms (described in detail below) and stored using NCGR’s XGI (Genome Initiative for species X) system (http://www.ncgr.org/xgi). The XGI genomic pipeline (XGI-g) analyzes genomic sequence data for each species and the results can be used in cross-species comparisons. Cross legume analyses include alignment of gene sequences to genomic contigs to validate ab initio gene predictions. Transcript sequences of expressed sequence tags (ESTs) from all available legumes are analyzed, annotated and stored using the XGI transcript pipeline (XGI-t). CMap, developed as part of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/cmap/index.shtml), has been incorporated into LIS to provide access to genetic and physical maps from all legume species. SoyBase (http://www.soybase.org) provides soybean map and biochemical pathway data.

2.2 Genetic and Physical Maps

2.2.1 Map Data:

All curated linkage maps from SoyBase have been incorporated into the CMap module of LIS. CMap provides researchers access to curated genetic and physical maps for Glycine max, and Medicago truncatula, as well as genetic maps for Medicago sativa (alfalfa), Phaseolus vulgaris (dry beans) and Arachis hypogaea (peanut). LIS will soon incorporate genetic and physical maps for peas, lentils and other legume species as they become available. At this time, correspondences between marker loci on different maps are based on curated name matches (provided by Dr. David Grant, USDA-ARS, Ames Iowa), taking into account the possibility of the same marker being mapped onto multiple loci within a map set for polyploid species. Marker comparisons are possible when legume genetic researchers adhere to nomenclature standards across species for mapped loci, thus further enabling identification of syntenic regions and comparative genomics.

2.2.2 Capabilities of the CMap Researcher Interface

CMap features positioned on the maps are displayed using different symbols and colors to represent the various feature types; e.g. the QTL data is color coded according to classification of QTL traits. When using the tool for comparative work, the researcher should choose a map from the database to be used as a reference map for the comparison. Comparative maps can then be added to the viewer so that alignments relative to the reference map can be investigated. The compared maps are selected from a list of all maps in the database and allow the researcher to specify the number of correspondences to the reference map. Alternatively, a comparison matrix can be used to display the number of correspondences between different maps and map sets in the database, and can be used to locate maps with high levels of synteny. These features allow researchers to compare linkage groups within species, a useful capability for these polyploid species, and among species, an essential capability for identification of syntenic regions. Lines indicating relationships between features, e.g., loci or QTL, are drawn between maps in a comparison. Feature details and map set information can be accessed from the “map view”. Map set details include species, map type, map units, curator remarks and the listing of the maps in the set. Feature details include the feature name, feature type, aliases or synonyms, map position, cross references to other databases as well as correspondence details from other maps associated with the feature. It is also possible to change the size of map images and save them for later research sessions.

2.3 Genomic Sequence Data
LIS genomic data consist of genomic sequences that have been analyzed, annotated and stored using the XGI genomic pipeline (XGI-g). Public sequence information is gathered from the NCBI’s High Throughput Genomic (HTG) division (3) for species of interest. HTG sequences from large-scale genome sequencing centers are submitted as in-process assemblies in various stages of completeness, often containing two or more contigs. LIS does not assemble the genomic data, but takes data from NCBI HTG as submitted by the sequencing centers. Each genomic sequence is then separated into its constituent contigs and analyzed using a sliding window (when appropriate to the given analysis) of length 10,000 bp with an overlap of 3000 bp. These nucleotide segments are automatically annotated using a computational pipeline that runs a series of sequence or motif similiarity computational algorithms. The XGI pipeline compares nucleotide segments using BLASTX [v. 2.1.3] (3,4,5) against NCBI’s nr database (3), and with BLASTN [v. 2.1.3] (4,5) and TBLASTX [v. 2.1.3] (4,5) against the consensus sequences produced in the LIS transcript database. BlimpSearcher [v. 3.5] analysis (6) against the Blocks+ database (7,8) is used to identify protein motifs and families. InterProScan [v. 3.1] (9) integrates results from a variety of protein motif analysis tools using the InterPro database (10). Consensus or equivalent analysis results between overlapping pieces are merged before being stored in the LIS database. GenScan [v. 1.0] (11) performs ab initio gene prediction on the genomic sequences and is used to evaluate complete contig sequences and to define and compare the genes and exon–intron organization of the sequences. The results of the genomic pipeline are stored in the LIS genomic database and are updated on a regular basis.

2.3.1 Capabilities of the Comparative Functional Genomics Browser

The Comparative Functional Genomics Browser (CFGB) is an application that allows the researcher to visualize genomic sequence annotations. It also enables visualization of comparative transcript data aligned to genomic contigs for purposes of validating gene predictions. The CFGB gives researchers the freedom to add multiple contigs to the viewer allowing for comparative analysis between sequences. By using description matches of the different analysis types, contigs can be compared for regions with similar annotation. Each genomic sequence has been annotated using the XGI genomic pipeline and represents the annotation types as colored blocks. The location and direction of transcript in the colored blocks are represented in relation to the genomic sequence. The size of the images in the CFGB can be manipulated with zooming, panning and sorting functions. The CFGB also supports the ability to change the aspect ratios for all sequences at the same time.

2.4 Transcript data

The LIS transcript database currently consists of EST and consensus sequences for M. truncatula, G. max, L. japonicus and A. thaliana. Using the XGI transcript pipeline (XGI-t), raw public EST data are gathered from NCBI (http://www.ncbi.nlm.nih.gov/dbEST/) and annotated automatically. Where available, quality scores for EST sequences are incorporated into the database for use in subsequent analyses. Detailed metadata concerning sequence origin, such as submitting organization, organism, clonal library and methodology, are captured in the database and are viewable with the LIS interface. Libraries also are categorized by a manual curation process for more accurate querying through the interface.

The XGI-t process begins by screening raw EST data for quality. Screening operations include removal of most common vector sequences, poly (A/T) trimming, N-trimming, adapter/linker removal, length trimming and poor quality read trimming. Vector screening and adapter/linker screening removes sequence contamination of the insert that typically arises as part of the cloning process. In addition, the fidelity of a sequence read typically degenerates toward the end of the sequence, resulting in errors in base calling which are trimmed out as part of this process. Finally, low-complexity sequences represented by polyadenylated regions are removed because such sequences can produce many false positive matches in subsequent analyses. The end result of the quality screens is a high-quality ‘approved’ sequence that is then deposited in the database. An EST that has failed the XGI vector screen analysis for one or more reasons is not included in subsequent analyses, but may still be inspected through the interface as a failed EST. Approved EST data are clustered using Phrap (12), which performs clustering and contig assembly to produce “consensus” sequences. These LIS consensus sequences are used in aggregation of the high quality sequence information of member ESTs and are used in all subsequent analyses. Consensus sequences are analyzed using NCBI’s BLASTX [v. 2.1.3] algorithm (3-5) to search for potential homologs against NCBI’s nr database (3). Blimp-Searcher [v. 3.2] (6) and InterProScan [v. 3.1] (9) are used as previously described in Section 2.3. Each of these analyses is followed association of Gene Ontology (GO) terms (13) with the computationally generated sequence annotation to further annotate the consensus sequences with potentially useful information. Pexfinder [v. 1.0] (14), co-developed by NCGR and OSU-OARDC, based on Signal P [v. 3.0] (15) has also been incorporated into the transcript analysis pipeline. Pexfinder (Protein Excreted) predicts proteins excreted through the plasma membrane, based on signal peptides. The results of the pipeline analyses are stored in the LIS transcript database and are updated periodically depending on the number of publicly available sequences for analysis.
2.5 Capabilities of the Features and Annotations Viewer

The Features and Annotation (F&A) viewer displays all the meta-data and annotations for a given LIS consensus sequence. Links to sequence details, multiple sequence alignments (MSAs), EST membership data as well as library and organism metadata are provided as well. The F&A module also gives a graphical presentation of the annotations linked directly to Gene Ontology where appropriate. In addition, output from the annotation details can be viewed by following the appropriate links.

3. Methods

3.1 Search and Display QTLs for SDS using LIS CMap.

To compare maps, we must first find and display a reference map. To begin, search for the SDS QTL feature in the database. Using the Cmap module, features are any elements that can be placed on a map, either as a point or an interval (See Note 1). Once the QTL has been found, its corresponding genetic map can be displayed.

Open http://www.comparative-legumes.org in a web browser and select “MAPS” from the navigation menu.
On the resulting page, select “Search”.
Enter SDS* in the “Feature Names” text box. The asterisk (*) and percent (%) signs can be used as wildcard characters. Using wildcards we insure that all QTLs with the name SDS are found. For example, SDS 1-1, SDS 8-2 etc. (See Note 2 ).
Next, restrict species to “Soybean” (See Note 3).
Select the Submit button. The query should retrieve a number of results.
For this example we will examine SDS 8-2 located on the 2003 Composite Genetic Map, linkage map C2. In CMap, a map is represented as a linear arrangement of interconnected features. This is usually a single linkage group in the case of a genetic map. Related maps are grouped into map sets. Generally, these are the result of a particular study, such as the set of linkage groups produced by a genetic mapping study. The 2003 Composite Genetic Map consists of 20 linkage groups constructed by Cregan et al. (16) using JoinMap (17) and data from segregating progeny of Glycine max A81-356022 x Glycine soja PI468916, Minsoy x Noir1, Minsoy x Archer, Noir1 x Archer, and Clark x Harosoy. This is the most recent soybean composite map available (See Note 4). By default the results are sorted by QTL name. To find SDS 8-2 you can either page through the results by selecting “Next” or to find the QTL most easily, sort the results by linkage map by selecting “Map Name”.
Once you have found SDS 8-2, select “Feature Details.” The Feature Details page provides helpful information about the QTL such as aliases, start and stop positions, accession ID, map information, correspondence details as well as a cross reference to the curated data at Soybase. By selecting the “View QTL data at Soybase” you will find more attribute information such as heritability, references to curated papers, parents, sample size and type of segregating progeny.
After identifying and selecting QTL, you will want to view the QTL on a linkage map with the “View on Map” tool. The map view for C2 shows that SDS 8-2 lies in a feature rich region between marker loci K418 and Satt460 (Figure 1). Different feature types are represented by different shapes (such as horizontal tick marks [for points], line intervals, boxes, arrows, etc.) or different colors. For more information on any feature, click on it to view the corresponding feature detail page (See Note 5).

3.2 Comparing Maps

Now that a reference map has been selected and displayed, it serves as the basis for comparisons with maps from other species. Comparative maps may be added as vertically represented maps to both the left and the right of the reference map. The researcher may keep adding additional maps as long as curated comparisons are available. We will compare the C2 linkage map of soybean to genetic maps in M. truncatula. We are looking for regions of M. truncatula that are syntenic to the SDS 8-2 regions in the reference soybean map.

The C2 map will be the reference map. In the “Show Comparison Menu”, choose the drop down list for Comparative Maps (Right):- This will display the new map to the right of the reference map.
From the drop down list select the genetic: barrel medic – Young (U. Minn) 2004 map [7]. The number in brackets [ ] represents the number of correspondences to the reference map. Correspondences are the number of feature matches between the reference map and the comparative map.
From the new list select 4 [2] (See Note 6).
Select the “Redraw Map” button.

The comparison between the soybean C2 linkage map and M. truncatula linkage map 4 shows a syntenic region between K365 and A538 (Figure 2). The area between these markers represents a region with potential annotated candidate genes (See Note 7).

3.3 Position Information to Genomic Sequence

We will now use M. truncatula physical maps to relate the position of the genetic markers to the genomic sequence.

To make the map easier to read, flip the M. truncatula linkage map by selecting the “F” located under the map label (See Note 5).
Select “Show Comparison Menu” and choose the drop down list for Comparative Maps (Right).
From the drop down list select physical: barrel medic- Cook/Kim MtGenome v3 (UC Davis) [70].
Select 1090 [2].
Select the “Redraw Map” button. The M. truncatula genetic map and the M. truncatula physical map show correspondence between the APX (ascorbate peroxidase) feature in the region we determined above (Figure 3). APX is actually a rather interesting candidate gene, when we consider APX’s function in plants. Ascorbate is essential to maintaining theantioxidant system, that protects plants from oxidative damagedue to biotic and abiotic stresses (17).
Looking at the physical map in this region we see that there is a genomic clone that is in phase3 of sequencing.
Click on the 021K24 clone (in blue) to view the Feature Details for this Phase3 clone.
Select “View annotated clone at LIS” to view the annotated genomic sequence.

3.4 Finding Candidate Genes within Annotated Genomic Sequences

The genomic sequence has been annotated using the LIS genomic pipeline and will provide assistance in selecting candidate genes. The Comparative Functional Genomics Browser (CFGB) tool is used to visualize the genomic annotations. The regions of interest are areas where hits from BLAST results line up with exon predictions from the GenScan prediction program. We will view the area between 92000 bp and 95000 bp.

Scroll over the blocks between 92000 and 95000. Moving your mouse over a feature block, the description/name of the feature pops up (figure 4). Red blocks (colored on the web site) are BlastX hits against NCBI nr, while pink and orange blocks represent TBlastX and TBlastN hits against LIS Consensus sequence libraries. The various shades of green represent InterPro results and the brown blocks are the GenScan hits (See Note 8).
Scroll over the TBlastX annotations (pink boxes) until you find Gm_ 014_240664_Apr04. This set of boxes represents a soybean consensus sequence that has been annotated using sequence comparisons as a peroxisomal ascorbate peroxidase (See Note 9). Click on the pink box for Gm_ 014_240664_Apr04.
The resulting TBlastX results list the sequence similarity to the genomic sequence. Click on the hyperlinks to view the consensus sequence information annotated by the LIS transcript pipeline.
The resulting page is the Features and Annotations Page. Looking at the annotation for this consensus sequence confirms that this sequence is from soybean, has been isolated from root libraries, and has been annotated as Ascorbate Peroxidase (APX). Thus, this comparison provides evidence that our candidate gene in M. truncatula has high sequence similarity to an annotated consensus sequence, expressed in soybean.
Now that a candidate gene has been selected, we want to find the EST sequence for the gene. To do so, select “Show Membership” to get the list of EST members used to create this consensus sequence.
The resulting page lists all ESTs used in the assembly of the consensus sequence. Select “sak50g04.y1” under Sequence Name to view the EST details. The sequence detail views capture all information relevant at the individual sequence level, and other information can be accessed through this display. This includes quality trimming, base composition, as well as sequence metadata and clustering information.
To save the EST sequence, select “FASTA” from the dropdown list.
Next, select the “Download” Button. Warning! The sequence is not saved automatically.
To save the sequence, you must select File>Save Page As.
Select the “Save” button. Congratulations! You now have the EST sequence that should provide a template to create primers for use in identifying polymorphic alleles. As a result, you can screen soybean cultivars for this candidate gene or use the primers for use in molecular mechanism studies.

4.0 Notes:

For a complete set of features in CMap please see http://www.comparative-legumes.org/cgi-bin/cmap/feature_type_info.
If you are searching for multiple names, separate them with commas or white space. To find features with spaces in the name, surround the name in double quotes, e.g., "abc 123”.
We did not select “Restrict Feature Type” since by default; ALL feature types will be searched. Since there are a number of possible QTL feature types, it is recommended that you query against ALL data types unless you are certain to improve your chances of finding the Feature Name.
As was done for the 1999 composite maps (17), the SoyBase staff interpolated markers from all other published mapping studies onto each linkage group using a proportional relationship between anchor loci (i.e. loci in common between the Cregan maps and the other population). Although this method allows the inclusion of loci from many different mapping studies for which the original segregation data are not available, it results in maps where the exact order of closely spaced loci may be incorrect. For this reason order and genetic distances between closely spaced loci should be considered approximate rather than exact.
CMap allows researchers a number of functions to improve map layouts. For general help using the CMap interface please see http://www.comparative-legumes.org/cgi-bin/cmap/help?section=map_viewer.
Since our goal is to find syntenic regions between maps, it is best to select maps with more than one correspondence.
You may notice the corresponding features (highlighted in red and connected by a light blue line) from soybean do not match those of Medicago truncatula. This is because these features have more than one alias.
For general help and options using the Comparative Functional Genomics Browser, please see http://www.comparative-legumes.org/lis/lis_help.html.
TBlastX compares the six-frame translations of the genomic sequence against the six-frame translations of the LIS consensus sequence data set.

Acknowledgements:

We would like to thank David Grant for his careful curation of map data. This work is supported by USDA-ARS SpecificCooperative Agreement #3625-21000-038-01.

References

Gonzales, M. D., Archuleta, E., Farmer, A., Gajendran, K., Grant, D., Shoemaker, R., Beavis, W. D., and Waugh, M. E. (2005) The Legume Information System (LIS): an integrated information resource for comparative legume biology. Nucleic Acids Res 33, D660-5.
Njiti, V. N., Meksem, K., Iqbal, M. J., Johnson, J. E., Kassem, M. A., Zobrist, K. F., Kilo, V. Y., and Lightfoot, D. A. (2002) Common loci underlie field resistance to soybean sudden death syndrome in Forrest, Pyramid, Essex, and Douglas. Theor Appl Genet 104, 294-300.
Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Wheeler,D.L. (2004) GenBank: update. Nucleic Acids Res 32, D23–D26.
Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.
Henikoff,S. and Henikoff,J.G. (1994) Protein family classification based on searching a database of blocks. Genomics, 19, 97–107.
Henikoff,J.G., Greene,E.A., Pietrokovski,S. and Henikoff,S. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28, 228–230.
Henikoff,S., Henikoff,J.G. and Pietrokovski,S. (1999) Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15, 471–479.
Zdobnov,E.M. and Apweiler,R. (2001) InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848.
Mulder,N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Barrell,D., Bateman,A., Binns,D., Biswas,M., Bradley,P., Bork,P. et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31, 315–318.
Burge,C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol 268, 78–94.
Green, P. (1993) Laboratory of Phil Green, University of Washington. http://www.phrap.org.
Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet 25, 25–29.
Torto, T.A., Li, S., Styer, A., Huitema, E., Testa, A., Gow, N.A., van West, P., Kamoun, S. (2003) EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen Phytophthora. Genome Res, . 13, 1675–1685
Bendtsen,J.D., Nielsen,H., von Heijne,G. and Brunak,S. (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340, 783–795.
Song, Q. J., Marek, L. F., Shoemaker, R. C., Lark, K. G., Concibido, V. C., Delannay, X., Specht, J. E., and Cregan, P. B. (2004) A new integrated genetic linkage map of the soybean. Theor Appl Genet 109, 122-8.
Stam. (1993) Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. Plant Journal 3, 739-744.
Shigeoka, S., Ishikawa, T., Tamoi, M., Miyagawa, Y., Takeda, T., Yabuta, Y., and Yoshimura, K. (2002) Regulation and function of ascorbate peroxidase isoenzymes. J Exp Bot 53, 1305-19.
Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J, Specht JE (1999) An Integrated Genetic Linkage Map of the Soybean Genome. Crop Sci 39, 1464-1490.

Figure Captions:

Figure 1: The soybean linkage map C2 from the 2003 soybean composite map is visualized using CMap. The QTL for SDS 8-2 is found in feature rich region at about 120 cM.
Figure 2: Comparing the C2 linkage map from soybean against linkage map 4 from barrel medic shows syntenic regions between markers DK413L and DK447R. In this region where syntenic relationships exist, potential candidate genes for SDS 8-2 exist.
Figure 3: In the regions of the genome where syntenic relationships exist between barrel medic and soybean, the annotated genomic sequence from barrel medic can be leveraged to identify candidate genes of interest.
Figure 4: The LIS Comparative Functional Genomics Browser allows the visualization of genomic analysis results, including comparative transcript data, with the gene-sequences aligned to genomic contigs to validate gene-predictions. The regions of interest are areas where hits from BLAST results line up with exon predictions from the GenScan prediction program.

Leveraging model legume information to find candidate genes for soybean Sudden Death Syndrome using the Legume Information System (lis)

Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J, Specht JE (1999) An Integrated Genetic Linkage Map of the Soybean Genome. Crop Sci 39, 1464-1490.