Example execution

Create a directory for your experiment:

mkdir test_virAnnot
cd test_virAnnot

Copy example read files, Illumina adapters fasta file, the sample id mapping file, the step and parameter file:

cp /path/to/virAnnot/examples/reads/*.fq .
cp /path/to/virAnnot/examples/adapters.fa .
cp /path/to/virAnnot/examples/map.txt .
cp /path/to/virAnnot/examples/step.yaml .
cp /path/to/virAnnot/examples/parameters.yaml .

You have to modify this file to fit your configuration.

This example contains all modules and options available and must be used as a template for your own analysis.

Step ReadSoustraction

This module uses bowtie2 to map reads against nucleotide sequence and Samtools to extract unmapped pairs.

Corresponding step.yaml section:

ReadSoustraction_phiX:
  i1: (file1)
  i2: (file2)
  db: phiX
  o1: (library)_phiX.r1.fq
  o2: (library)_phiX.r2.fq
  sge: True
  n_cpu: 5
  iter: library
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n ReadSoustraction_phiX

Step Demultiplex

This module uses cutadapt demultiplex reads from library and also trim reads from adapters and chimeric reads. Each demultiplexing step are described in the module section. Corresponding step.yaml section:

Demultiplex:
  i1: (library)_phiX.r1.fq
  i2: (library)_phiX.r2.fq
  adapters: adapters.fna
  middle: 1
  min_qual: 20
  polyA: True
  min_len: 70
  iter: library
  sge: True
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Demultiplex

Step DemultiplexHtml

This module gather all *_demultiplex.stats.csv files and create and html report. Corresponding step.yaml section:

DemultiplexHtml:
  csv: (library)_demultiplex.stats.csv
  id: (library)
  out: stat_demultiplex
  iter: global
  sge: True
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n DemultiplexHtml

Output example:

Step Diamond

This module launch Diamond similarity search for reads and produce an XML file per sample.

  n_cpu: 16
Map_spades:
  contigs: (SampleID)_spades.scaffold.fa
  i1: (SampleID)_truePairs_r1.fq
  i2: (SampleID)_truePairs_r2.fq
  bam: (SampleID)_spades.scaffold.bam
  rn: (SampleID)_spades.scaffold.rn
  sge: True
  n_cpu: 16
Diamond:
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Diamond

Step Assembly

This module simply launch udba_ud, newbler and metaspades assemblers in each sample folder, rename scaffolds id and move the resulting fasta file.

  n_cpu: 5
  sge: True
drVM:
  i1: (SampleID)_truePairs_r1.fq
  i2: (SampleID)_truePairs_r2.fq
  n_cpu: 20
  identity: 70
  min_len: 300
  sge: True
Assembly_idba:
  prog: idba
  n_cpu: 5
  i1: (SampleID)_truePairs_r1.fq
  i2: (SampleID)_truePairs_r2.fq

Test both idba and spades:

virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Assembly_idba
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Assembly_spades

Example of idba assembly:

>ds2015-149_0
GTGTAAGGTGGTGAAGG...
>ds2015-149_1
CCTGCGAATTGGGCCAA...

Step Map

This module uses bowtie2 to map reads back on assemblies and samtools will count reads per contig.

Step file:

  out: (SampleID)_idba.scaffold.fa
  sge: True
Assembly_spades:
  prog: spades
  n_cpu: 5
  i1: (SampleID)_truePairs_r1.fq
  i2: (SampleID)_truePairs_r2.fq
  out: (SampleID)_spades.scaffold.fa
  sge: True
Map_idba:
  contigs: (SampleID)_idba.scaffold.fa
  i1: (SampleID)_truePairs_r1.fq
  i2: (SampleID)_truePairs_r2.fq
  bam: (SampleID)_idba.scaffold.bam
  rn: (SampleID)_idba.scaffold.rn
  sge: True

Command:

virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Map_idba
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Map_spades

Output a two column tabular file, column 1 sequence ID, column 2 number of reads. Example of .rn file produce:

ds2015-149_0    1179
ds2015-149_1    444
ds2015-149_10   26
ds2015-149_11   44
ds2015-149_12   14
ds2015-149_13   4
ds2015-149_14   6

Step Blast

This module is able to launch Blast(s) against provided databases localy or remotely. The script blast_launch.py must be present on distant servers and parameter.yaml modified to fit your servers.

Step file:

  i: (SampleID)_idba.scaffold.dmdx.nr.csv
  contigs: (SampleID)_idba.scaffold.dmdx2bltx.fa
  out: (SampleID)_idba.scaffold.dmdx2bltx.nr.xml
  type: blastx
  db: nr
  evalue: 0.0001
  server: genologin
  n_cpu: 8
  tc: 50
  num_chunk: 1000
  max_target_seqs: 1
  sge: True
Blast_allvirTX:
  type: tblastx
  contigs: (SampleID)_idba.scaffold.fa
  db: all_vir_nucl
  out: (SampleID)_idba.scaffold.tbltx.all_vir.xml
  evalue: 0.0001
  server: genotoul
  n_cpu: 8
  sge: True
  num_chunk: 1000
  tc: 50
Blast_nr:
  type: blastx
  contigs: (SampleID)_idba.scaffold.fa
  db: nr
  out: (SampleID)_idba.scaffold.bltx.nr.xml
  evalue: 0.0001
  server: genotoul
  n_cpu: 8
  tc: 50
  num_chunk: 1000
  max_target_seqs: 1
  sge: True
Blast_refvirTX:
  type: tblastx
  contigs: (SampleID)_idba.scaffold.fa
  db: refseq_vir_nucl
  out: (SampleID)_idba.scaffold.tbltx.refseq_vir.xml
  evalue: 0.0001
  server: genotoul
  n_cpu: 8

Commands:

virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_nr
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_refvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_allvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast_RPS

Step Blast2ecsv

This module uses the XML file generated by the corresponding Blast module and the taxonomy contained in the SQLITE database to create a csv file containing match options, taxonomy string and sequences.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast2ecsv_nr
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast2ecsv_refvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Blast2ecsv_allvirTX

Example output of ds2015-149_idba.scaffold.tbltx.all_vir.csv:

#algo   query_id        nb_reads        query_length    accession       description     organism        percentIdentity nb_hsps queryOverlap    hitOverlap      evalue  score   tax_id  taxonomy        sequence
"TBLASTX"       "ds2015-149_52" "6"     "117"   "KX274275.1"    "Grapevine rupestris stem pitting associated virus isolate SK704 B, complete genome"    "Grapevine rupestris stem pitting-associated virus"     "95.8"  "3"     "100"   "3"     "7.55823333338424e-05"  "222.2257"      "196400"        "Viruses;ssRNA viruses;Betaflexiviridae;Foveavirus;Grapevine rupestris stem pitting-associated virus"   "GAACACTATGAACGACAACTGGAAATCTGAGCACGCTATAAACACCCACAAACTCAAACGTAGACAAAGCTTTAACTAAGTTATTCATAATAATCACACCATGCCAAACACTTAAGG"

Step Rps2ecsv

This module uses the rpstblastn XML file and the PFAM taxonomy to annotate query sequences and produce a readable CSV file.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Rps2ecsv
  vs: True
  out: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
  sge: True
  type: TBLASTX
  score: 50

Example output of ds2015-149_idba.scaffold.rps.pfam.csv:

#query_id       query_length    cdd_id  hit_id  evalue  startQ  endQ    frame   description     superkingdom    no rank family  genus
"ds2015-149_0"  "1428"  "pfam01443"     "gnl|CDD|279750"        "1.33194e-06"   "29"    "223"   "2"     "pfam01443, Viral_helicase1, Viral (Superfamily 1) RNA helicase.  Helicase activity for this family has been demonstrated and NTPase activity. This helicase has multiple roles at different stages of viral RNA replication, as dissected by mutational analysis."     "Viruses(1.00);"        "ssRNA viruses(0.99);unclassified viruses(0.01);"       "Alphaflexiviridae(0.36);Virgaviridae(0.24);Betaflexiviridae(0.15);Tymoviridae(0.10);Bromoviridae(0.07);"       "Potexvirus(0.26);Allexivirus(0.10);Carlavirus(0.08);Tymovirus(0.08);Tobamovirus(0.08);"

Step Ecsv2excel

This module takes all csv files and create a compile them in a single Excel file.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Ecsv2excel
  qov: 50
  hov: 5
Blast2ecsv_nr:
  contigs: (SampleID)_idba.scaffold.fa
  evalue: 0.001
  fhit: True
  pm: global

Example output of ds2015-149_idba.scaffold.xlsx:

_images/ecsv2excel.png

Step Ecsv2krona

This module uses CSV files from Blast2ecsv module to create Krona html file.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Ecsv2krona
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Ecsv2krona_dmd

Example output of Krona:

Step Automapper

This module uses Blast CSV annotation file to select reference sequences, map query sequences and produce png of identity plot and alignment file in fasta format.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Automapper_nr
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Automapper_allvirTX
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Automapper_refseqTX

Example output of ds2015-149/ds2015-149_autoMapper_nr:

Step Rps2tree

This module use Rps2ecsv results of all sample to cut and group sequences based on identified domains and create OTUs, identity matrix, tree nexus files and png for each domains colored by SampleID.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Rps2tree

Step Getresults

This module simply copy important results file to a result directory.
virAnnot.py -m map.txt -s step.yaml -p parameters.yaml -n Getresults