We are going to use a set of 18S (Euk1391f-EukBr) samples taken from streams and the Atlantic Ocean in/near New Hampshire.
Our first task is to import our fastq.gz reads into qiime2 Artifacts.Artifact files have the extension qza, short for Qiime Zipped Archive or qzv, short for Qiime Zipped Visualization. They both contain the appropriate data, your feature table, your aligned sequences, your taxonomy visualization and provenance information. The provenance information will let us look back at the tree of qiime2 commands that led to that artifact, including all of the options your ran each command with.
In this case we are starting with demultiplexed paired end 250 bp, from an illumina HiSeq. Each sample has three files, the data we need is the froward read “*R1*.fastq.gz” and the reverse read “*R1*.fastq.gz” I have made a small python script to copy the files into a file structure recognized by qiime2.
sort_reads.py
## sort_reads.py ##
import glob
import os
import shutil
os.mkdir("reads")
for f in glob.glob("*/*.gz"):
shutil.copy2(f, "reads/" + f.split("/")[1])
for f in glob.glob("reads/*R3*"):
os.rename(f, f.replace("R3", "R2"))
qiime2 is installed as a conda environment, so that we can maintain qiime specific versions of the programs that it uses.
So we need to activate that environment each time we start working in a new session.
source activate qiime2-2018.6 # Activate the current qiime2-environment/sandbox
source tab-qiime # Load tab completion for qiime2
Now we can do the actual import. The key parts of this command is the type and source format.These tell qiime2 what the data is, and therefor how it should use it.
qiime tools import\
--type 'SampleData[PairedEndSequencesWithQuality]'\
--input-path reads\
--source-format CasavaOneEightSingleLanePerSampleDirFmt\
--output-path demux.qza
qiime demux summarize\
--i-data demux.qza\
--o-visualization demux.qzv
The visualization demux.qzv gives us a similar quality plot to what you would see from fastqc.Based on the demux.qzv I chose to truncate the forward reads at 182 bp, and the reverse reads at 133 bp.
qiime dada2 denoise-paired\
--i-demultiplexed-seqs demux.qza\
--p-trim-left-f 0 --p-trim-left-r 0\
--p-trunc-len-f 182 --p-trunc-len-r 133\
--p-n-threads 72\
--o-representative-sequences rep-seqs --o-table table --o-denoising-stats dns
qiime feature-table summarize\
--i-table table.qza\
--m-sample-metadata-file mdat.tsv\
--o-visualization table
qiime feature-table tabulate-seqs\
--i-data rep-seqs.qza\
--o-visualization rep-seqs
qiime metadata tabulate\
--m-input-file dns.qza\
--o-visualization dns.qzv
Now that we have the representative sequences, we can assign taxonomy to them.I am going to base the taxonomy off of a blast against the SILVA 132 database. (In this case the practical reason is that the command is exactly the same for 16S and 18S).
qiime feature-classifier classify-consensus-blast\
--i-query rep-seqs.qza\
--i-reference-taxonomy /data/share/databases/SILVA_databases/SILVA_132_QIIME_release/taxonomy/taxonomy_all/99/majority_taxonomy_all_levels.qza\
--i-reference-reads /data/share/databases/SILVA_databases/SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva132_99.qza\
--o-classification taxonomy\
--p-perc-identity 0.8\
--p-maxaccepts 1
qiime metadata tabulate\
--m-input-file taxonomy.qza\
--o-visualization taxonomy.qzv
qiime taxa barplot\
--i-table table.qza\
--i-taxonomy taxonomy.qza\
--m-metadata-file mdat.tsv\
--o-visualization taxa-barplots.qzv
qiime feature-table heatmap\
--i-table table.qza\
--m-metadata-file mdat.tsv\
--o-visualization heatmap.qzv\
--m-metadata-column Type
Let’s pull out just certain groups, diatoms, what else..?
mkdir diatom
qiime taxa filter-table\
--i-table table.qza\
--i-taxonomy taxonomy.qza\
--p-include Diatomea\
--o-filtered-table diatom/table
qiime taxa barplot\
--i-table diatom/table.qza\
--i-taxonomy taxonomy.qza\
--m-metadata-file mdat.tsv\
--o-visualization diatom/taxa-barplots.qzv
qiime feature-table filter-seqs\
--i-table diatom/table.qza\
--i-data rep-seqs.qza\
--o-filtered-data diatom/rep-seqs.qza
qiime phylogeny align-to-tree-mafft-fasttree\
--i-sequences diatom/rep-seqs.qza\
--o-alignment diatom/aligned-rep-seqs.qza\
--o-masked-alignment diatom/masked-aligned-rep-seqs.qza\
--o-tree diatom/unrooted-tree.qza\
--o-rooted-tree diatom/rooted-tree.qza
Let’s make heatmaps for all the different taxonomic levels. We have 15 levels of taxonomy from silva, so it would be nice to not have to type 30 lines of qiime commands!
#! /bin/bash
mkdir heatmaps
for ((i = 1; i < 16; i++)); do
qiime taxa collapse\
--i-table table.qza\
--i-taxonomy taxonomy.qza\
--p-level $i\
--o-collapsed-table heatmaps/level-$i-taxa-table
qiime feature-table heatmap\
--i-table heatmaps/level-$i-taxa-table.qza\
--m-metadata-file mdat.tsv\
--o-visualization heatmaps/level-$i-heatmap.qzv\
--m-metadata-column Type\
--p-cluster features &
done
Notice the ampersand after the second command, we can put it in the background and start the next taxa-collapse while the previous visualization is still computing! Lets make this script executable, then run it!
chmod +x levels.sh
./levels.sh
Create the phylogenetic tree
qiime phylogeny align-to-tree-mafft-fasttree\
--i-sequences rep-seqs.qza\
--o-alignment aligned-rep-seqs.qza\
--o-masked-alignment masked-aligned-rep-seqs.qza\
--o-tree unrooted-tree.qza\
--o-rooted-tree rooted-tree.qza
Choose a rarefaction depth to normalize our samples to. [Use table.qzv]
qiime diversity alpha-rarefaction\
--i-table table.qza\
--p-max-depth 100000\
--p-min-depth 1000\
--p-steps 30\
--o-visualization alpha-rarefaction
qiime diversity core-metrics-phylogenetic\
--i-phylogeny rooted-tree.qza\
--i-table table.qza\
--p-sampling-depth 22589\
--m-metadata-file mdat.tsv\
--output-dir core-metrics-results
Pick and visualize our favorite metrics…
qiime diversity alpha-group-significance\
--i-alpha-diversity core-metrics-results/faith_pd_vector.qza\
--m-metadata-file mdat.tsv\
--o-visualization core-metrics-results/faith-pd-group-significance.qzv
qiime diversity beta-group-significance\
--i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza\
--m-metadata-file mdat.tsv\
--m-metadata-column Type\
--o-visualization core-metrics-results/unweighted-unifrac-Type-significance.qzv\
--p-pairwise
faith-pd-group-significance.qzv