Ana səhifə

Supplementary Materials for


Yüklə 362.5 Kb.
tarix25.06.2016
ölçüsü362.5 Kb.

Supplementary Materials for


Whole genome re-sequencing of non-model organisms: lessons from unmapped reads
Anaïs GOUIN, Fabrice LEGEAI, Pierre NOUHAUD, Annabel WHIBLEY,

Jean-Christophe SIMON, Claire LEMAITRE




1. Description of the data


Individual sampling and assignment

Pea aphids were collected in late August 2011 within a 30 km-diameter location in Eastern France on different plants known to harbor ten of the pea aphid biotypes (biotypes associated with Cytisus scoparius, Lathyrus pratensis, Lotus corniculatus, Melilotus spp., Medicago lupulina, Ononis spinosa, Securigera varia, Vicia cracca, Medicago sativa and Trifolium pratense). As their host plant had already been harvested at this time, individuals belonging to the Pisum sativum biotype were sampled in Western France in November 2011. All individuals were genotyped at seven microsatellite loci (AlA09M, AlB07M, AlB08M, AlB12M, ApF08M, ApH08M and ApH10M, see Caillaud et al, 2004) following Peccoud et al (2008). Individuals with multiple clonal copies of the same genotype were discarded from the dataset on the basis of this multilocus genotype. The clustering software STRUCTURE (Pritchard et al, 2000) was then used to identify individuals showing good assignment to the genetic cluster associated with their collection plant, excluding migrants from other plants and possible hybrids. The number of clusters was set to K = 11, 100 000 MCMC chains were run after a 25 000 burn-in period and analyses were performed with admixture and without any prior information on the sampling plant or site. In parallel, symbiont typing was carried out for each individual through diagnostic PCR using specific primers and following Peccoud et al (2014).


Individual selection and whole-genome resequencing

For each biotype, three different individuals were selected on the basis of their STRUCTURE assignment score (mean 92.3 %, min. 78.0 %). The characteristics of the 33 individual genotypes that were included in the re-sequencing project are given in Table S1. DNA was extracted for three fourth-instar (clonal) larvae per individual to obtain a sufficient amount of genetic material using DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer’s instructions. DNA concentration was determined for each sample by spectrometry after RNAse treatment using PHERAstar (BMG Labtech.). Each of the 33 samples was processed according to the standard ILLUMINA protocol for preparing libraries for paired-end sequencing (mean insert size of 250 bp). Libraries were sequenced with a 100-bp paired end run on an ILLUMINA HiSeq2000.



2. Pipeline : used command-lines


Initial mapping: bowtie2 --non-deterministic --rg-id $ID --rg $SM --rg $LB --rg $PL -x reference_index -1 fastq1.fq -2 fastq2.fq -S output.sam
Unmapped read trimming: prinseq-lite.pl -verbose -fastq unmapped.fastq -trim_ns_right 0 -trim_qual_right 20 -trim_qual_window 10 -trim_qual_step 2 -trim_qual_type mean -min_len 66
Compareads:

./Compareads1.2_Beta5.sh -a fastq1.fq -b fastq2.fq -k 33 -t 2 -s 0
Assembly using AbySS:

ABYSS -k31 fastq.fq -o $out
Mapping with Stampy:

python stampy.py -g references.index -h references.index -M fastq1.fq fastq2.fq > out.sam

SNP calling pipeline (GATK):

1/ Duplicate removal:

java -Xmx4g -jar MarkDuplicates.jar INPUT=file.bam OUTPUT=out.markdup.bam METRICS_FILE=out.markdup.metrics REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=800

2/ Indel realignment:



java -Xmx4g -jar GenomeAnalysisTK.jar -I out.markdup.bam -R references.fasta -T RealignerTargetCreator -o out.forIndelRealigner.intervals

java -Xmx4g -Djava.io.tmpdir=tmp/ -jar GenomeAnalysisTK.jar -I out.markdup.bam -R references.fasta -T IndelRealigner -targetIntervals out.forIndelRealigner.intervals -o out.realigned.bam

3/ Unified Genotyper:



java -Xmx4g -jar GenomeAnalysisTK.jar -R references.fasta -T UnifiedGenotyper -I out.realigned.bam [-I ...] -o out.vcf
Assembly using SPAdes:

spades.py -s reads.fa -k 31,41,63,81,89 -o out

3. 16S RNA phylogenetic analyses of Rickettsia and Spiroplasma symbionts


Two phylogenetic analyses based on 16S rDNA sequences were carried out to establish the placement of the two symbiont genomes (Rickettsia and Spiroplasma) revealed in the unmapped read sets of some pea aphid individuals.

3.a Rickettsia 16S phylogeny

To obtain the 16S ribosomal RNA gene sequence of the Rickettsia genome present in our pea aphid data, reads from individual Ps2 (individual of the P. sativum biotype for which we got the higher coverage after mapping to the Rickettsia bellii genome) that mapped to the 16S gene of Rickettsia bellii genome were extracted, and then assembled using Minia (Chikhi and Rizk, 2012). The assembly led to a single contig of 1640 bp.

16S ribosomal RNA genes from all available Rickettsia species having their complete genome sequenced were collected from NCBI (NR_044656.1, NR_103923.1, NR_074480.1, NR_074394.1, NR_118678.1, NR_074496.1, NR_074474.1, NR_074486.1, NR_074485.1, NR_025967.1, NR_074472.1, NR_025921.1, NR_074497.1, NR_074470.1, NR_074459.1, NR_074469.1, NR_074471.1, NR_074488.1, NR_074527.1, NR_074483.1, NR_074487.1) and from SILVA database (Quast et al, 2013) (ACLC01000066, CP000849). The pipeline of Phylogeny.fr (Dereeper et al, 2008) was used to infer a maximum likelihood phylogenetic tree (multiple alignment with muscle, gblocks to clean the alignment and then PhyML for tree inference).

The obtained phylogenetic tree in Fig. S1 confirms that the closest relative to the Rickettsia pea aphid symbiont is Rickettsia bellii.



Figure S1: Phylogeny of Rickettsia based on 16S RNA gene sequences. The assembly of the 16S sequence of pea aphid Ps2 individual is highlighted in red. Branch support values are indicated in % in dark blue on top of each branch.


3.b Spiroplasma 16S phylogeny

To obtain the 16S ribosomal RNA gene sequence of the symbiont present in V. cracca biotype, the 16S sequence of Spiroplasma melliferum KC3 was aligned with Blast to the contigs obtained from de novo assembly. A unique match with 99 % identity was obtained, resulting in a sequence of 1444 bp.

Other Spiroplasma and outgroup species were selected based on a recent phylogeny of the Spiroplasma genus (Ku et al, 2013), with the 10 Spiroplasma species having their complete genome sequenced, and two without a reference genome (NR_121737.1, NR_103945.1, NR_121738.1, NR_121708.1, NR_036849.1, NR_121701.1, NR_121702.1, NR_121794.1, NR_103946.1, GU585671, GU993266 accessions in NCBI, and AGBZ01000004 SILVA database). Three outgroup sequences of Mycoplasma and Phytoplasma genera were gathered from the Molligen database (Barré et al, 2004). Finally, the 16S sequence obtained from a Spiroplasma symbiont reported in a Japanese pea aphid strain (named as Spiroplasma sp. SM) was added (Fukatsu et al, 2001, NCBI accession AB048263). The pipeline of Phylogeny.fr (Dereeper et al, 2008) was used to build the phylogenetic tree.

The obtained phylogenetic tree in Fig. S2 is consistent with previous analyses based on 16S rDNA (Ku et al, 2013). Importantly, it confirms the membership of the symbiont we assembled to the Spiroplasma genus and the absence of any close relative with a complete genome available in the databases.

Figure S2: Phylogeny of Spiroplasma genus based on 16S RNA gene sequences. The 16S sequence retrieved from the assembly of the pea aphid symbiont in V. cracca biotype is highlighted in red. Species whose complete genome sequence is not available are indicated in grey. Branch support values are indicated in % in dark blue on top of each branch.



4. Other additionnal tables and figures


Individual genotype

Plant origin

Assignment coefficient (STRUCTURE)

Symbionts

Spiroplasma

R. insecticola

H. defensa

Rickettsiella

S. symbiotica

Rickettsia

PAXS

Wolbachia

B. aphidicola

Cs1

C. scoparius

0.95

-

-

-

-

-

-

-

-



Cs2

C. scoparius

0.93

-

-

-

-

-

-

-

-



Cs3

C. scoparius

0.96

-

-

-

-

-

-

-

-



Lc1

L. corniculatus

0.94

-

-

-

-

-

-

-

-



Lc2

L. corniculatus

0.95

-

-

-

-

-

-

-

-



Lc3

L. corniculatus

0.88

-

-

-

-

-

-

-

-



Lp1

L. pratensis

0.96

-

-

-

-

-

-

-

-



Lp2

L. pratensis

0.97

-

-

-

-

-

-

-

-



Lp3

L. pratensis

0.97

-

-

-

-

-

-

-

-



Ml1

M. lupulina

0.86

-

-

-

-

-



-

-



Ml2

M. lupulina

0.94

-

-

-

-

-



-

-



Ml3

M. lupulina

0.87

-

-

-

-

-



-

-



Mo1

M. officinalis

0.95

-

-

-

-

-

-

-

-



Mo2

M. officinalis

0.97

-

-

-

-

-

-

-

-



Mo3

M. officinalis

0.96

-

-

-

-

-

-

-

-



Ms1

M. sativa

0.78

-

-



-

-

-

-

-



Ms2

M. sativa

0.91

-

-





-

-

-

-



Ms3

M. sativa

0.94

-

-



-

-

-

-

-



Os1

O. spinosa

0.97

-

-



-

-

-

-

-



Os2

O. spinosa

0.96

-

-

-

-

-

-

-

-



Os3

O. spinosa

0.97

-

-



-

-

-

-

-



Sv1

S. varia

0.96

-

-

-

-




-

-

-



Sv2

S. varia

0.92

-

-

-

-



-

-

-



Sv3

S. varia

0.94

-

-

-

-



-

-

-



Tp1

T. pratense

0.93

-



-

-

-

-

-

-



Tp2

T. pratense

0.93

-



-

-

-

-

-

-



Tp3

T. pratense

0.91





-

-

-

-

-

-



Vc1

V. cracca

0.93



-

-

-



-

-

-



Vc2

V. cracca

0.82

-

-

-

-

-

-

-

-



Vc3

V. cracca

0.81



-

-

-



-

-

-



Ps1

P. sativum

0.92

-

-

-

-





-

-



Ps2

P. sativum

0.84

-

-

-

-

-



-

-



Ps3

P. sativum

0.96

-

-

-

-

-

-

-

-



Table S1: Characteristics of the 33 individual genotypes selected for whole-genome re-sequencing. Black dots indicate for each individual genotype the presence of one or several of the 9 symbionts (1 obligatory and 8 facultative) reported for the pea aphid and detected based on a PCR-specific test. The obligate symbiont Buchnera is present in all pea aphid genotypes and its PCR-based detection is used as a positive control.

Table S2: Coverage for each individual obtained after initial mapping (Bowtie2) of paired-end read sets onto a set of reference genomes (Acyrthosiphon pisum, mitochondrial genome, Buchnera aphidicola, Spiroplasma melliferum KC3, Hamiltonella defensa 5AT, Rickettsiella grylli, Regiella insecticola R5.15, Wolbachia sp. Strain wRi, Rickettsia sp. Endosymbiont of Ixodes scapularis, Serratia symbiotica str. Tucson). Coverage obtained after mapping of unmapped reads to Rickettsia bellii and to annotated contigs of the Spiroplasma draft are also provided. Coverage higher than 2x is highlighted in grey and the sizes of reference genomes are indicated above the corresponding names. PCR test indicates the presence (+) or absence (-) of the facultative symbionts of A. pisum based on their detection with species-specific primers.



Individual

nDNA A. pisum

mtDNA A. pisum

Buchnera aphidicola

Spiroplasma melliferum

Spiroplasma A. pisum (partial)

PCR test

Hamiltonella defensa

PCR test

Rickettsiella

grylli

PCR test

Regiella insecticola

PCR test

Wolbachia Strain wRi

PCR test

Rickettsia ixodes

Rickettsia bellii

PCR test

Serratia symbiotica

PCR test

 

530 Mb

17 kb

600 kb

1.29 Mb

780 kb

 

2.11 Mb

 

1.58 Mb

 

2 Mb

 

1.45 Mb

 

2.1 Mb

1.5 Mb

 

2.60 Mb

 

Ms1

14.3

283.94

248.25

0

0

-

103.59

+

0.21

+

0.65

-

0

-

0

0

-

0.07

-

Ms2

12.34

384.54

534.9

0

0

-

117.7

+

0

-

2.76

+

0

-

0

0

-

0.27

-

Ms3

12.93

541.43

557.86

0

0

-

14.06

+

0

-

0.05

-

0

-

0

0.01

-

0.02

-

Tp1

11.85

365.78

297.77

0

0.13

-

0.53

-

0

-

52.96

+

0

-

0

0

-

0.02

-

Tp2

12.77

677.67

427.31

0

0.21

-

0.22

-

0

-

38.64

+

0

-

0

0

-

0.02

-

Tp3

13.71

2080.8

1501.75

0.21

273.13

+

0.1

-

0

-

35.4

+

0

-

0

0

-

0.01

-

Vc1

16.84

1241.77

900.81

0.13

277.65

+

0.03

-

0

-

0.08

-

0

-

0

0.03

-

5.23

+

Vc2

11.91

733.97

702.45

0

0.12

-

0.01

-

0

-

0.02

-

0

-

0

0

-

0

-

Vc3

11.36

777.87

726.5

0.71

1185.31

+

0.09

-

0

-

0.01

-

0

-

0

0.01

-

6.69

+

Ps1

14.25

1846.42

820.04

0

0.28

-

0.04

-

0

-

0.02

-

0

-

0.91

13.51

+

7.65

+

Ps2

12.23

392.91

138.08

0

0.61

-

0.01

-

0

-

0.01

-

0

-

3.67

59.26

+

0.07

-

Ps3

11.95

483.42

294.02

0

0.07

-

0.09

-

0

-

0.19

-

0

-

0

0.01

-

0

-

Ml1

15.84

540.63

623.68

0

0

-

0.04

-

0

-

0.02

-

0

-

0.47

8.01

+

0

-

Ml2

17.91

1161.81

901.62

0

0

-

0

-

0

-

0.01

-

0

-

3.51

54.33

+

0

-

Ml3

10.42

257.09

170.1

0

0

-

0

-

0

-

0.01

-

0

-

0.63

10.71

+

0

-

Lc1

13.56

387.48

182.24

0

0.63

-

0

-

0

-

0.12

-

0

-

0

0

-

0

-

Lc2

19.96

2046.81

1290.62

0

0.10

-

0

-

0

-

0.01

-

0

-

0

0

-

0

-

Lc3

13.79

797.32

530.88

0

2.05

-

0

-

0

-

0.01

-

0

-

0

0

-

0.02

-

Mo1

13.37

1338.12

1091.21

0

0

-

0.04

-

0

-

0.01

-

0

-

0

0

-

0

-

Mo2

20.16

971.29

1269.51

0

0

-

0.06

-

0

-

0.02

-

0

-

0

0

-

0

-

Mo3

14.21

406.15

436.51

0

0

-

0.03

-

0

-

0.01

-

0

-

0

0

-

0

-

Sv1

16.03

696.52

621.32

0

0.02

-

0.01

-

0

-

0.02

-

0

-

0

0

-

0.01

-

Sv2

15.79

444.72

464.11

0

0

-

0.05

-

0

-

0.01

-

0

-

0

0

-

9.01

+

Sv3

13.52

1690.28

985.69

0

0

-

0.21

-

0

-

0.02

-

0

-

0

0

-

14.84

+

Cs1

10.6

771.1

933.57

0

0

-

0

-

0

-

0.08

-

0

-

0

0

-

0

-

Cs2

15.1

3248.6

1509.03

0

0

-

0

-

0

-

0.01

-

0

-

0

0

-

0

-

Cs3

17.53

1389.32

1349.63

0

0

-

0

-

0

-

0.13

-

0

-

0

0

-

0

-

Os1

15.41

1067.73

836.98

0

0

-

2.42

+

0

-

0.01

-

0

-

0

0.01

-

0.01

-

Os2

16.12

1295.36

1097.57

0

0

-

0.09

-

0

-

0.01

-

0

-

0.01

0.05

-

0

-

Os3

12.34

331.39

1459.96

0

0.01

-

49.97

+

0

-

0.47

-

0

-

0.02

0

-

0.35

-

Lp1

14.95

796.29

602.21

0

0

-

0

-

0

-

0.02

-

0

-

0

0

-

0.02

-

Lp2

13.89

918.28

545.64

0

0

-

0.01

-

0

-

0.02

-

0

-

0

0

-

0

-

Lp3

14.64

850.72

657.28

0

0

-

0

-

0

-

0.02

-

0

-

0

0

-

0

-

Figure S3: Percentages of similar reads between one individual and all others. Each dot represents the percentage of similarity between the studied individual and each of the remaining 32 individuals. Red dots correspond to comparisons between two individuals of the same biotype.



Literature cited

Barré A., de Daruvar A. and Blanchard A. (2004). MolliGen, a database dedicated to the comparative genomics of Mollicutes. Nucleic Acids Res. 32, Database issue, D307-310 URL : http://www.molligen.org

Caillaud MC, Mondor-Genson G, Levine-Wilkinson S, Mieuzet L, Frantz A, Simon JC et al (2004). Microsatellite DNA markers for the pea aphid Acyrthosiphon pisum. Mol Ecol Notes 4(3): 446-448.
Chikhi R. and Rizk G. (2012) Space-efficient and exact De Bruijn graph representation based on a bloom filter. In WABI , volume 7534 of Lecture Notes in Computer Science , pages 236–248. Springer.
Dereeper A., Guignon V., Blanc G., Audic S., Buffet S., Chevenet F., Dufayard J.-F., Guindon S., Lefort V., Lescot M., Claverie J.-M., Gascuel O. (2008) Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucl. Acids Res. Jul 1; 36 (Web Server Issue):W465-9. Epub 2008 Apr 19.
Fukatsu T, Tsuchida T, Nikoh N, Koga R (2001). Spiroplasma symbiont of the pea aphid, Acyrthosiphon pisum (Insecta: Homoptera). Appl Environ Microbiol 67(3): 1284-1291.
Ku C, Lo WS, Chen LL, Kuo CH. (2013) Complete genomes of two dipteran-associated Spiroplasmas provided insights into the origin, dynamics, and impacts of viral invasion in Spiroplasma. Genome Biol Evol. 2013;5(6):1151-64.

Peccoud J, Figueroa CC, Silva AX, Ramirez CC, Mieuzet L, Bonhomme J et al (2008). Host range expansion of an introduced insect pest through multiple colonizations of specialized clones. Mol Ecol 17(21): 4608-4618.


Pritchard JK, Stephens M, Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics 155: 945-959.
Peccoud J, Bonhomme J, Mahéo F, de la Huerta M, Cosson O, Simon JC (2014). Inheritance patterns of secondary symbionts during sexual reproduction of pea aphid biotypes. Insect Sci 21: 291–300.

Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41 (D1): D590-D596.


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©atelim.com 2016
rəhbərliyinə müraciət