Giga 3 Qiime2 notes

We are going to use a set of 18S (Euk1391f-EukBr) samples taken from streams and the Atlantic Ocean in/near New Hampshire.

Importing your data

Our first task is to import our fastq.gz reads into qiime2 Artifacts.Artifact files have the extension qza, short for Qiime Zipped Archive or qzv, short for Qiime Zipped Visualization. They both contain the appropriate data, your feature table, your aligned sequences, your taxonomy visualization and provenance information. The provenance information will let us look back at the tree of qiime2 commands that led to that artifact, including all of the options your ran each command with.

In this case we are starting with demultiplexed paired end 250 bp, from an illumina HiSeq. Each sample has three files, the data we need is the froward read “*R1*.fastq.gz” and the reverse read “*R1*.fastq.gz” I have made a small python script to copy the files into a file structure recognized by qiime2.

sort_reads.py
## sort_reads.py ##
import glob
import os
import shutil
os.mkdir("reads")
for f in glob.glob("*/*.gz"):
    shutil.copy2(f, "reads/" + f.split("/")[1])
for f in glob.glob("reads/*R3*"):
    os.rename(f, f.replace("R3", "R2"))

qiime2 is installed as a conda environment, so that we can maintain qiime specific versions of the programs that it uses.

So we need to activate that environment each time we start working in a new session.

source activate qiime2-2018.6    # Activate the current qiime2-environment/sandbox
source tab-qiime                 # Load tab completion for qiime2

Now we can do the actual import. The key parts of this command is the type and source format.These tell qiime2 what the data is, and therefor how it should use it.

qiime tools import\
      --type 'SampleData[PairedEndSequencesWithQuality]'\
      --input-path reads\
      --source-format CasavaOneEightSingleLanePerSampleDirFmt\
      --output-path demux.qza

qiime demux summarize\
      --i-data demux.qza\
      --o-visualization demux.qzv

demux.qzv

The visualization demux.qzv gives us a similar quality plot to what you would see from fastqc.Based on the demux.qzv I chose to truncate the forward reads at 182 bp, and the reverse reads at 133 bp.

qiime dada2 denoise-paired\
      --i-demultiplexed-seqs demux.qza\
      --p-trim-left-f 0 --p-trim-left-r 0\
      --p-trunc-len-f 182 --p-trunc-len-r 133\
      --p-n-threads 72\
      --o-representative-sequences rep-seqs --o-table table --o-denoising-stats dns

qiime feature-table summarize\
      --i-table table.qza\
      --m-sample-metadata-file mdat.tsv\
      --o-visualization table

qiime feature-table tabulate-seqs\
      --i-data rep-seqs.qza\
      --o-visualization rep-seqs

qiime metadata tabulate\
      --m-input-file dns.qza\
      --o-visualization dns.qzv

table.qzv

rep-seqs.qzv

dns.qzv

Now that we have the representative sequences, we can assign taxonomy to them.I am going to base the taxonomy off of a blast against the SILVA 132 database. (In this case the practical reason is that the command is exactly the same for 16S and 18S).

qiime feature-classifier classify-consensus-blast\
      --i-query rep-seqs.qza\
      --i-reference-taxonomy /data/share/databases/SILVA_databases/SILVA_132_QIIME_release/taxonomy/taxonomy_all/99/majority_taxonomy_all_levels.qza\
      --i-reference-reads /data/share/databases/SILVA_databases/SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva132_99.qza\
      --o-classification taxonomy\
      --p-perc-identity 0.8\
      --p-maxaccepts 1

qiime metadata tabulate\
      --m-input-file taxonomy.qza\
      --o-visualization taxonomy.qzv

qiime taxa barplot\
      --i-table table.qza\
      --i-taxonomy taxonomy.qza\
      --m-metadata-file mdat.tsv\
      --o-visualization taxa-barplots.qzv

qiime feature-table heatmap\
      --i-table table.qza\
      --m-metadata-file mdat.tsv\
      --o-visualization heatmap.qzv\
      --m-metadata-column Type

taxonomy.qzv

taxa-barplots.qzv

Let’s pull out just certain groups, diatoms, what else..?

mkdir diatom

qiime taxa filter-table\
      --i-table table.qza\
      --i-taxonomy taxonomy.qza\
      --p-include Diatomea\
      --o-filtered-table diatom/table

qiime taxa barplot\
      --i-table diatom/table.qza\
      --i-taxonomy taxonomy.qza\
      --m-metadata-file mdat.tsv\
      --o-visualization diatom/taxa-barplots.qzv

qiime feature-table filter-seqs\
      --i-table diatom/table.qza\
      --i-data rep-seqs.qza\
      --o-filtered-data diatom/rep-seqs.qza

qiime phylogeny align-to-tree-mafft-fasttree\
      --i-sequences diatom/rep-seqs.qza\
      --o-alignment diatom/aligned-rep-seqs.qza\
      --o-masked-alignment diatom/masked-aligned-rep-seqs.qza\
      --o-tree diatom/unrooted-tree.qza\
      --o-rooted-tree diatom/rooted-tree.qza

Bacillariophyceae

taxa-barplots.qzv

Tree

Gastrotricha

taxa-barplots.qzv

Tree

Nematoda

taxa-barplots.qzv

Tree

Let’s make heatmaps for all the different taxonomic levels. We have 15 levels of taxonomy from silva, so it would be nice to not have to type 30 lines of qiime commands!

levels.sh

#! /bin/bash
mkdir heatmaps
for ((i = 1; i < 16; i++)); do
    qiime taxa collapse\
          --i-table table.qza\
          --i-taxonomy taxonomy.qza\
          --p-level $i\
          --o-collapsed-table heatmaps/level-$i-taxa-table
    qiime feature-table heatmap\
          --i-table heatmaps/level-$i-taxa-table.qza\
          --m-metadata-file mdat.tsv\
          --o-visualization heatmaps/level-$i-heatmap.qzv\
          --m-metadata-column Type\
          --p-cluster features &
done

Notice the ampersand after the second command, we can put it in the background and start the next taxa-collapse while the previous visualization is still computing! Lets make this script executable, then run it!

chmod +x levels.sh
./levels.sh

Create the phylogenetic tree

qiime phylogeny align-to-tree-mafft-fasttree\
      --i-sequences rep-seqs.qza\
      --o-alignment aligned-rep-seqs.qza\
      --o-masked-alignment masked-aligned-rep-seqs.qza\
      --o-tree unrooted-tree.qza\
      --o-rooted-tree rooted-tree.qza

Choose a rarefaction depth to normalize our samples to. [Use table.qzv]

qiime diversity alpha-rarefaction\
      --i-table  table.qza\
      --p-max-depth 100000\
      --p-min-depth 1000\
      --p-steps 30\
      --o-visualization alpha-rarefaction

qiime diversity core-metrics-phylogenetic\
      --i-phylogeny rooted-tree.qza\
      --i-table table.qza\
      --p-sampling-depth 22589\
      --m-metadata-file mdat.tsv\
      --output-dir core-metrics-results

alpha-rarefaction.qzv

Pick and visualize our favorite metrics…

qiime diversity alpha-group-significance\
      --i-alpha-diversity core-metrics-results/faith_pd_vector.qza\
      --m-metadata-file mdat.tsv\
      --o-visualization core-metrics-results/faith-pd-group-significance.qzv

qiime diversity beta-group-significance\
      --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza\
      --m-metadata-file mdat.tsv\
      --m-metadata-column Type\
      --o-visualization core-metrics-results/unweighted-unifrac-Type-significance.qzv\
      --p-pairwise

faith-pd-group-significance.qzv

unweighted-unifrac-Type-significance.qzv

unweighted_unifrac_emperor.qzv