Parameters files¶
parameters.yaml¶
Defines paths for both local and remote binaries and databases. A template is provided in the examples
directory.
ReadSoustraction:
db:
vitis: '/media/data/db/ncbi/vitis/vitis'
phiX: '/media/data/db/ncbi/phiX/phiX174'
bin:
bowtie: '/usr/local/bin/bowtie2'
samtools: '/usr/bin/samtools'
bedtools: '/usr/bin/bedtools'
prinseq: '/usr/local/bin/prinseq-lite.pl'
merge-paired-reads: '/home/stheil/softwares/sortmerna-2.1-linux-64/scripts/merge-paired-reads.sh'
unmerge-paired-reads: '/home/stheil/softwares/sortmerna-2.1-linux-64/scripts/unmerge-paired-reads.sh'
sortmerna: '/home/stheil/softwares/sortmerna/sortmerna'
servers:
enki:
db:
nt: '/media/data/db/ncbi/nt/nt'
nr: '/media/data/db/ncbi/nr/nr'
refseq_vir_nucl: '/media/data/db/ncbi/refseq_vir/viral.genomic.fna'
refseq_vir_prot: '/media/data/db/ncbi/refseq_vir/viral.protein.faa'
pfam: '/home/stheil/save/db/pfam/pfam_viruses_rpsdb'
all_vir_nucl: '/media/data/db/ncbi/all_vir/all_vir_nucl.fna'
all_vir_prot: '/media/data/db/ncbi/all_vir/all_vir_prot.faa'
genotoul:
adress: 'genotoul.toulouse.inra.fr'
username: 'stheil'
db:
nr: '/bank/blastdb/nr'
nt: '/bank/blastdb/nt'
refseq_vir_nucl: '/save/stheil/db/refseq_vir/viral.genomic.fna'
refseq_vir_prot: '/save/stheil/db/refseq_vir/viral.protein.faa'
pfam: '/home/stheil/save/db/pfam/pfam_viruses_rpsdb'
all_vir_nucl: '/home/stheil/save/db/all_vir/all_vir_nucl.fna'
all_vir_prot: '/home/stheil/save/db/all_vir/all_vir_prot.faa'
scratch: '/work/stheil'
bin:
blastx: 'blastx+'
blastn: 'blastn+'
genologin:
adress: 'genologin.toulouse.inra.fr'
username: 'mlefebvre'
db:
nr: '/bank/ncbi/blast/nr/current/blast/nr'
nt: '/bank/ncbi/blast/nr/current/blast/nt'
dmd_nr: '/bank/diamonddb/nr'
refseq_vir_nucl: '/save/mlefebvre/db/refseq_vir/viral.genomic.fna'
refseq_vir_prot: '/save/mlefebvre/db/refseq_vir/viral.protein.faa'
pfam: '/home/mlefebvre/work/pfam/Pfam'
all_vir_nucl: '/home/mlefebvre/save/db/all_vir/all_vir_nucl.fna'
all_vir_prot: '/home/mlefebvre/save/db/all_vir/all_vir_prot.faa'
scratch: '/work/mlefebvre'
bin:
blastx: 'blastx'
blastn: 'blastn'
avakas:
adress: 'avakas.mcia.univ-bordeaux.fr'
username: 'stheil'
db:
nr: '/home/stheil/db/nr/nr'
nt: '/home/stheil/db/nt/nt'
all_vir_nucl: '/home/stheil/scratch/db/all_vir/all_vir_nucl.fna'
all_vir_prot: '/home/stheil/scratch/db/all_vir/all_vir_prot.faa'
refseq_vir_nucl: '/home/stheil/scratch/db/refseq_vir/viral.genomic.fna'
refseq_vir_prot: '/home/stheil/scratch/db/refseq_vir/viral.protein.faa'
pfam: '/home/stheil/db/pfam/pfam_viruses_rpsdb'
scratch: '/scratch/stheil'
bin:
blastx: 'blastx'
blastn: 'blastn'
Diamond:
db:
all_vir_prot: /media/db/ncbi/all_vir/all_vir_prot
SortMeRna:
db:
silva-arc-16s-id95: /media/data/db/rRNA_databases/silva-arc-16s-id95
silva-arc-23s-id98: /media/data/db/rRNA_databases/silva-arc-23s-id98
silva-bac-16s-id90: /media/data/db/rRNA_databases/silva-bac-16s-id90
silva-bac-23s-id98: /media/data/db/rRNA_databases/silva-bac-23s-id98
silva-euk-18s-id95: /media/data/db/rRNA_databases/silva-euk-18s-id95
silva-euk-28s-id98: /media/data/db/rRNA_databases/silva-euk-28s-id98
step.yaml¶
Defines the steps that the pipeline will execute. A template is provided in the /examples
directory.
Step names correspond to a python module that will launch the step. Step names are split based on the ‘_’ character so you can launch multiple instance. For example you might want to launch blastx and blastn, so step names could be ‘Blast_N’ and ‘Blast_X’. What is after the underscore do not matters, it is just used to differanciate the two steps.
Special words in bracket are used as substitution string. - (file), (file1) and (file2) - (SampleID) - (library)
ReadSoustraction_phiX:
i1: (file1)
i2: (file2)
db: phiX
o1: (library)_phiX.r1.fq
o2: (library)_phiX.r2.fq
sge: True
n_cpu: 5
iter: library
Demultiplex:
i1: (library)_phiX.r1.fq
i2: (library)_phiX.r2.fq
adapters: adapters.fna
middle: 1
min_qual: 20
polyA: True
min_len: 70
iter: library
sge: True
DemultiplexHtml:
csv: (library)_demultiplex.stats.csv
id: (library)
out: stat_demultiplex
iter: global
sge: True
Normalization:
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
o1: (SampleID)_truePairs_norm_r1.fq
o2: (SampleID)_truePairs_norm_r2.fq
num: 40000
iter: sample
n_cpu: 5
sge: True
drVM:
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
n_cpu: 20
identity: 70
min_len: 300
sge: True
Assembly_idba:
prog: idba
n_cpu: 5
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
out: (SampleID)_idba.scaffold.fa
sge: True
Assembly_spades:
prog: spades
n_cpu: 5
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
out: (SampleID)_spades.scaffold.fa
sge: True
Map_idba:
contigs: (SampleID)_idba.scaffold.fa
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
bam: (SampleID)_idba.scaffold.bam
rn: (SampleID)_idba.scaffold.rn
sge: True
n_cpu: 16
Map_spades:
contigs: (SampleID)_spades.scaffold.fa
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
bam: (SampleID)_spades.scaffold.bam
rn: (SampleID)_spades.scaffold.rn
sge: True
n_cpu: 16
Diamond:
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
n_cpu: 10
sge: True
score: 50
evalue: 0.0001
qov: 50
hov: 5
db: all_vir_prot
Diamond_singletons_nr:
contigs: (SampleID)_idba.scaffold.fa
db: nr
ising: (SampleID)_singletons.fq
n_cpu: 10
sge: True
out: (SampleID)_singletons_test.nr.dmdx.xml
evalue: 0.001
iter: sample
score: 10
qov: 10
Diamond2blast:
i: (SampleID)_idba.scaffold.dmdx.nr.csv
contigs: (SampleID)_idba.scaffold.dmdx2bltx.fa
out: (SampleID)_idba.scaffold.dmdx2bltx.nr.xml
type: blastx
db: nr
evalue: 0.0001
server: genologin
n_cpu: 8
tc: 50
num_chunk: 1000
max_target_seqs: 1
sge: True
Blast_allvirTX:
type: tblastx
contigs: (SampleID)_idba.scaffold.fa
db: all_vir_nucl
out: (SampleID)_idba.scaffold.tbltx.all_vir.xml
evalue: 0.0001
server: genotoul
n_cpu: 8
sge: True
num_chunk: 1000
tc: 50
Blast_nr:
type: blastx
contigs: (SampleID)_idba.scaffold.fa
db: nr
out: (SampleID)_idba.scaffold.bltx.nr.xml
evalue: 0.0001
server: genotoul
n_cpu: 8
tc: 50
num_chunk: 1000
max_target_seqs: 1
sge: True
Blast_refvirTX:
type: tblastx
contigs: (SampleID)_idba.scaffold.fa
db: refseq_vir_nucl
out: (SampleID)_idba.scaffold.tbltx.refseq_vir.xml
evalue: 0.0001
server: genotoul
n_cpu: 8
tc: 50
num_chunk: 1000
sge: True
Blast_singleton_nr:
type: blastx
contigs: (SampleID)_singletons.fa
db: nr
out: (SampleID)_singletons.bltx.nr.xml
evalue: 0.0001
server: genologin
n_cpu: 8
tc: 10
num_chunk: 1000
sge: True
Blast_RPS:
type: rpstblastn
contigs: (SampleID)_idba.scaffold.fa
db: pfam
evalue: 0.0001
out: (SampleID)_idba.scaffold.rps.pfam.xml
server: genotoul
n_cpu: 8
sge: True
Blast2ecsv_allvirTX:
contigs: (SampleID)_idba.scaffold.fa
evalue: 0.001
fhit: True
pm: global
if: xml
rn: (SampleID)_idba.scaffold.rn
r: True
b: (SampleID)_idba.scaffold.tbltx.all_vir.xml
vs: True
out: (SampleID)_idba.scaffold.tbltx.all_vir.csv
sge: True
type: TBLASTX
score: 50
qov: 20
Blast2ecsv_refvirTX:
contigs: (SampleID)_idba.scaffold.fa
evalue: 0.0001
fhit: True
pm: global
if: xml
rn: (SampleID)_idba.scaffold.rn
r: True
b: (SampleID)_idba.scaffold.tbltx.refseq_vir.xml
vs: True
out: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
sge: True
type: TBLASTX
score: 50
qov: 50
hov: 5
Blast2ecsv_nr:
contigs: (SampleID)_idba.scaffold.fa
evalue: 0.001
fhit: True
pm: global
if: xml
rn: (SampleID)_idba.scaffold.rn
r: True
b: (SampleID)_idba.scaffold.bltx.nr.xml
vs: True
out: (SampleID)_idba.scaffold.bltx.nr.csv
sge: True
type: BLASTX
score: 50
qov: 5
hov: 5
Blast2ecsv_dmd:
evalue: 0.01
fhit: True
pm: global
if: xml
r: True
b: (SampleID)_dmd.xml
out: (SampleID)_dmd.allVirProt.csv
sge: True
type: BLASTX
pd: True
Blast2ecsv_dmdx_singletons_nr:
contigs: (SampleID)_idba.scaffold.fa
evalue: 0.001
fhit: True
pm: global
if: xml
rn: (SampleID)_idba.scaffold.rn
r: True
b: (SampleID)_singletons.nr.dmdx.xml
vs: True
out: (SampleID)_singletons_test.nr.dmdx.csv
sge: True
type: DIAMONDX
pd: True
Rps2ecsv:
b: (SampleID)_idba.scaffold.rps.pfam.xml
out: (SampleID)_idba.scaffold.rps.pfam.csv
evalue: 0.0001
sge: True
Ecsv2excel:
b1: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
b2: (SampleID)_idba.scaffold.tbltx.all_vir.csv
b3: (SampleID)_idba.scaffold.bltx.nr.csv
r: (SampleID)_idba.scaffold.rps.pfam.csv
out: (SampleID)_idba.scaffold.xlsx
sge: True
Ecsv2compare:
b1: (SampleID)_idba.scaffold.bltx.nr.csv
r: (SampleID)_idba.scaffold.rps.pfam.csv
out: (SampleID)_idba.scaffold.comparison.xlsx
sge: True
Blast2hist:
id1: (SampleID)_refseq_tbltx
b1: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
id2: (SampleID)_allvir_tbltx
b2: (SampleID)_idba.scaffold.tbltx.all_vir.csv
id3: (SampleID)_nr_bltx
b3: (SampleID)_idba.scaffold.bltx.nr.csv
id4: (SampleID)_dmd
b4: (SampleID)_dmd.allVirProt.csv
iter: global
sge: True
out: blast_hist
Ecsv2krona:
id1: (SampleID)_refseq_tbltx
b1: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
x1: (SampleID)_idba.scaffold.tbltx.refseq_vir.xml
id2: (SampleID)_allvir_tbltx
b2: (SampleID)_idba.scaffold.tbltx.all_vir.csv
x2: (SampleID)_idba.scaffold.tbltx.all_vir.xml
id3: (SampleID)_nr_bltx
b3: (SampleID)_idba.scaffold.bltx.nr.csv
x3: (SampleID)_idba.scaffold.bltx.nr.xml
outdir: krona_blast
out: blast.global.krona.html
data: both
r: True
c: identity
iter: global
sge: True
Ecsv2krona_dmd:
id1: (SampleID)
b1: (SampleID)_dmd.allVirProt.csv
outdir: krona_diamond
out: global_krona_dmd.html
data: contig
r: True
c: identity
iter: global
sge: True
Automapper_nr:
contigs: (SampleID)_idba.scaffold.fa
ecsv: (SampleID)_idba.scaffold.bltx.nr.csv
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
out: (SampleID)_autoMapper_nr
sge: True
ref: nt
Automapper_allvirTX:
contigs: (SampleID)_idba.scaffold.fa
ecsv: (SampleID)_idba.scaffold.tbltx.all_vir.csv
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
out: (SampleID)_autoMapper_allvir
sge: True
ref: all_vir_nucl
Automapper_refseqTX:
contigs: (SampleID)_idba.scaffold.fa
ecsv: (SampleID)_idba.scaffold.tbltx.refseq_vir.csv
i1: (SampleID)_truePairs_r1.fq
i2: (SampleID)_truePairs_r2.fq
out: (SampleID)_autoMapper_refseq
sge: True
ref: refseq_vir_nucl
Rps2tree:
pfam: (SampleID)_idba.scaffold.rps.pfam.csv
contigs: (SampleID)_idba.scaffold.fa
ecsv: (SampleID)_idba.scaffold.bltx.nr.csv
id: (SampleID)
out: rps2tree_global
min_prot: 100
viral_portion: 0.3
perc: 90
iter: global
sge: True
Getresults:
global_dir1: rps2tree_global
global_dir2: krona_blast
global_dir3: krona_diamond
global_dir4: blast_hist
global_dir5: stat_demultiplex
sample_dir1: (SampleID)_autoMapper_nr
sample_dir2: (SampleID)_autoMapper_refseq
sample_dir3r: (SampleID)_autoMapper_allvir
sample_file1: (SampleID)_idba.scaffold.xlsx
sample_file2: (SampleID)_idba.scaffold.fa
sample_file3: (SampleID)_spades.scaffold.fa
sample_file4: (SampleID)_truePairs_r1.fq
sample_file5: (SampleID)_truePairs_r2.fq
out: results
map.txt¶
The map file describe the experiment. It is a tabulated file with the first line containing headers starting with ‘#’. It must contain at least two column: SampleID and file.
A template is provided in the examples
directory.
This is a minimum map.txt file:
#SampleID mid common file1 file2 library
ds2016-121 AACCGCAA TGTGTTGGGTGTGTTTGG Lib1_phiX.R1.fastq Lib1_phiX.R2.fastq lib1
ds2016-132 AACTAGTA TGTGTTGGGTGTGTTTGG Lib1_phiX.R1.fastq Lib1_phiX.R2.fastq lib1
ds2016-122 AGGCGCCT TGTGTTGGGTGTGTTTGG Lib2_phiX.R1.fastq Lib2_phiX.R2.fastq lib2
ds2016-133 ATTAGCTA TGTGTTGGGTGTGTTTGG Lib2_phiX.R1.fastq Lib2_phiX.R2.fastq lib2
ds2016-123 CAAGAGTT TGTGTTGGGTGTGTTTGG Lib3_phiX.R1.fastq Lib3_phiX.R2.fastq lib3
ds2016-55 CAAGCAGG TGTGTTGGGTGTGTTTGG Lib3_phiX.R1.fastq Lib3_phiX.R2.fastq lib3
ds2016-124 CCAACCAT TGTGTTGGGTGTGTTTGG Lib4_phiX.R1.fastq Lib4_phiX.R2.fastq lib4
ds2016-56 CGATAGAG TGTGTTGGGTGTGTTTGG Lib4_phiX.R1.fastq Lib4_phiX.R2.fastq lib4
ds2016-125 GCTCTACC TGTGTTGGGTGTGTTTGG Lib5_phiX.R1.fastq Lib5_phiX.R2.fastq lib5
ds2016-57 GCTGCGGT TGTGTTGGGTGTGTTTGG Lib5_phiX.R1.fastq Lib5_phiX.R2.fastq lib5
ds2016-58 GGCCAGAA TGTGTTGGGTGTGTTTGG Lib6_phiX.R1.fastq Lib6_phiX.R2.fastq lib6
ds2016-10 GGTACTCC TGTGTTGGGTGTGTTTGG Lib6_phiX.R1.fastq Lib6_phiX.R2.fastq lib6
ds2016-11 TCGGATGC TGTGTTGGGTGTGTTTGG Lib7_phiX.R1.fastq Lib7_phiX.R2.fastq lib7
ds2015-149 TCTATGAC TGTGTTGGGTGTGTTTGG Lib7_phiX.R1.fastq Lib7_phiX.R2.fastq lib7
ds2015-162 TTCTGGCT TGTGTTGGGTGTGTTTGG Lib8_phiX.R1.fastq Lib8_phiX.R2.fastq lib8
ds2015-170 TTGCGTCA TGTGTTGGGTGTGTTTGG Lib8_phiX.R1.fastq Lib8_phiX.R2.fastq lib8
You can add categories for each sample so they can be used when coloring sequences in trees from the Rps2tree module. One library can be attributed to multiple samples, as shown in the example. Thus the demultiplexing step will be able to differentiate each sample and separate them.