Example execution¶
Create a directory for your experiment:
mkdir test_virAnnot
cd test_virAnnot
Copy example read files, Illumina adapters fasta file, the sample id mapping file, the step and parameter file:
cp /path/to/virAnnot/examples/reads/*.fq .
cp /path/to/virAnnot/examples/adapters.fa .
cp /path/to/virAnnot/examples/map.txt .
cp /path/to/virAnnot/examples/step.yaml .
cp /path/to/virAnnot/examples/parameters.yaml .
You have to modify this file to fit your configuration.
This example contains all modules and options available and must be used as a template for your own analysis.
Step ReadSoustraction¶
This module uses bowtie2 to map reads against nucleotide sequence and Samtools to extract unmapped pairs.
Corresponding step.yaml
section:
ReadSoustraction_phiX:
i1: (file1)
i2: (file2)
db: phiX
o1: (library)_phiX.r1.fq
o2: (library)_phiX.r2.fq
sge: True
n_cpu: 5
iter: library
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n ReadSoustraction_phiX
Step Demultiplex¶
This module uses cutadapt demultiplex reads from library and also trim reads from adapters and chimeric reads.
Each demultiplexing step are described in the module section.
Corresponding step.yaml
section:
Demultiplex:
i1: (library)_phiX.r1.fq
i2: (library)_phiX.r2.fq
adapters: adapters.fna
middle: 1
min_qual: 20
polyA: True
min_len: 70
iter: library
sge: True
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Demultiplex
Step DemultiplexHtml¶
This module gather all *_demultiplex.stats.csv
files and create and html report.
Corresponding step.yaml
section:
DemultiplexHtml:
csv: (library)_demultiplex.stats.csv
id: (library)
out: stat_demultiplex
iter: global
sge: True
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n DemultiplexHtml
Output example:
Step Diamond¶
This module launch Diamond similarity search for reads and produce an XML file per sample.
n_cpu: 16
Map_spades:
contigs: (SampleID)_spades.scaffold.fa
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
bam: (SampleID)_spades.scaffold.bam
rn: (SampleID)_spades.scaffold.rn
sge: True
n_cpu: 16
Diamond:
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Diamond
Step Assembly¶
This module simply launch udba_ud, newbler and metaspades assemblers in each sample folder, rename scaffolds id and move the resulting fasta file.
n_cpu: 5
sge: True
drVM:
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
n_cpu: 20
identity: 70
min_len: 300
sge: True
Assembly_idba:
prog: idba
n_cpu: 5
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
Test both idba and spades:
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Assembly_idba
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Assembly_spades
Example of idba assembly:
>ds2015-149_0
GTGTAAGGTGGTGAAGG...
>ds2015-149_1
CCTGCGAATTGGGCCAA...
Step Map¶
This module uses bowtie2 to map reads back on assemblies and samtools will count reads per contig.
Step file:
out: (SampleID)_idba.scaffold.fa
sge: True
Assembly_spades:
prog: spades
n_cpu: 5
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
out: (SampleID)_spades.scaffold.fa
sge: True
Map_idba:
contigs: (SampleID)_idba.scaffold.fa
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
bam: (SampleID)_idba.scaffold.bam
rn: (SampleID)_idba.scaffold.rn
sge: True
Command:
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Map_idba
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Map_spades
Output a two column tabular file, column 1 sequence ID, column 2 number of reads.
Example of .rn
file produce:
ds2015-149_0 1179
ds2015-149_1 444
ds2015-149_10 26
ds2015-149_11 44
ds2015-149_12 14
ds2015-149_13 4
ds2015-149_14 6
Step Blast¶
This module is able to launch Blast(s) against provided databases localy or remotely.
The script blast_launch.py must be present on distant servers and parameter.yaml
modified to fit your servers.
Step file:
i: (SampleID)_idba.scaffold.dmdx.nr.csv
contigs: (SampleID)_idba.scaffold.dmdx2bltx.fa
out: (SampleID)_idba.scaffold.dmdx2bltx.nr.xml
type: blastx
db: nr
evalue: 0.0001
server: genologin
n_cpu: 8
tc: 50
num_chunk: 1000
max_target_seqs: 1
sge: True
Blast_allvirTX:
type: tblastx
contigs: (SampleID)_idba.scaffold.fa
db: all_vir_nucl
out: (SampleID)_idba.scaffold.tbltx.all_vir.xml
evalue: 0.0001
server: genotoul
n_cpu: 8
sge: True
num_chunk: 1000
tc: 50
Blast_nr:
type: blastx
contigs: (SampleID)_idba.scaffold.fa
db: nr
out: (SampleID)_idba.scaffold.bltx.nr.xml
evalue: 0.0001
server: genotoul
n_cpu: 8
tc: 50
num_chunk: 1000
max_target_seqs: 1
sge: True
Blast_refvirTX:
type: tblastx
contigs: (SampleID)_idba.scaffold.fa
db: refseq_vir_nucl
out: (SampleID)_idba.scaffold.tbltx.refseq_vir.xml
evalue: 0.0001
server: genotoul
n_cpu: 8
Commands:
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_nr
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_refvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_allvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_RPS
Step Blast2ecsv¶
This module uses the XML file generated by the corresponding Blast module and the taxonomy contained in the SQLITE database to create a csv file containing match options, taxonomy string and sequences.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast2ecsv_nr
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast2ecsv_refvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast2ecsv_allvirTX
Example output of ds2015-149_idba.scaffold.tbltx.all_vir.csv:
#algo query_id nb_reads query_length accession description organism percentIdentity nb_hsps queryOverlap hitOverlap evalue score tax_id taxonomy sequence
"TBLASTX" "ds2015-149_52" "6" "117" "KX274275.1" "Grapevine rupestris stem pitting associated virus isolate SK704 B, complete genome" "Grapevine rupestris stem pitting-associated virus" "95.8" "3" "100" "3" "7.55823333338424e-05" "222.2257" "196400" "Viruses;ssRNA viruses;Betaflexiviridae;Foveavirus;Grapevine rupestris stem pitting-associated virus" "GAACACTATGAACGACAACTGGAAATCTGAGCACGCTATAAACACCCACAAACTCAAACGTAGACAAAGCTTTAACTAAGTTATTCATAATAATCACACCATGCCAAACACTTAAGG"
Step Rps2ecsv¶
This module uses the rpstblastn XML file and the PFAM taxonomy to annotate query sequences and produce a readable CSV file.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Rps2ecsv
vs: True
out: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
sge: True
type: TBLASTX
score: 50
Example output of ds2015-149_idba.scaffold.rps.pfam.csv:
#query_id query_length cdd_id hit_id evalue startQ endQ frame description superkingdom no rank family genus
"ds2015-149_0" "1428" "pfam01443" "gnl|CDD|279750" "1.33194e-06" "29" "223" "2" "pfam01443, Viral_helicase1, Viral (Superfamily 1) RNA helicase. Helicase activity for this family has been demonstrated and NTPase activity. This helicase has multiple roles at different stages of viral RNA replication, as dissected by mutational analysis." "Viruses(1.00);" "ssRNA viruses(0.99);unclassified viruses(0.01);" "Alphaflexiviridae(0.36);Virgaviridae(0.24);Betaflexiviridae(0.15);Tymoviridae(0.10);Bromoviridae(0.07);" "Potexvirus(0.26);Allexivirus(0.10);Carlavirus(0.08);Tymovirus(0.08);Tobamovirus(0.08);"
Step Ecsv2excel¶
This module takes all csv files and create a compile them in a single Excel file.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Ecsv2excel
qov: 50
hov: 5
Blast2ecsv_nr:
contigs: (SampleID)_idba.scaffold.fa
evalue: 0.001
fhit: True
pm: global
Example output of ds2015-149_idba.scaffold.xlsx:
Step Ecsv2krona¶
This module uses CSV files from Blast2ecsv module to create Krona html file.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Ecsv2krona
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Ecsv2krona_dmd
Example output of Krona:
Step Automapper¶
This module uses Blast CSV annotation file to select reference sequences, map query sequences and produce png of identity plot and alignment file in fasta format.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Automapper_nr
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Automapper_allvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Automapper_refseqTX
Example output of ds2015-149/ds2015-149_autoMapper_nr:
Step Rps2tree¶
This module use Rps2ecsv results of all sample to cut and group sequences based on identified domains and create OTUs, identity matrix, tree nexus files and png for each domains colored by SampleID.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Rps2tree
Step Getresults¶
This module simply copy important results file to a result directory.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Getresults