# Giga 3 Qiime2 notes We are going to use a set of 18S (Euk1391f-EukBr) samples taken from streams and the Atlantic Ocean in/near New Hampshire. #### Some links [Qiime 2](https://qiime2.org/) [Qiime 2 view](https://view.qiime2.org/) [Overview of QIIME 2 Plugin Workflows](https://docs.qiime2.org/2018.6/tutorials/overview/#) [Qiime 2 Documentation](https://docs.qiime2.org) ## Importing your data Our first task is to import our fastq.gz reads into qiime2 Artifacts. Artifact files have the extension qza, short for Qiime Zipped Archive or qzv, short for Qiime Zipped Visualization. They both contain the appropriate data, your feature table, your aligned sequences, your taxonomy visualization and provenance information. The provenance information will let us look back at the tree of qiime2 commands that led to that artifact, including all of the options your ran each command with. In this case we are starting with demultiplexed paired end 250 bp, from an illumina HiSeq. Each sample has three files, the data we need is the froward read "\*R1\*.fastq.gz" and the reverse read "\*R1\*.fastq.gz" I have made a small python script to copy the files into a file structure recognized by qiime2. ~~~bash sort_reads.py ~~~ ~~~python ## sort_reads.py ## import glob import os import shutil os.mkdir("reads") for f in glob.glob("*/*.gz"): shutil.copy2(f, "reads/" + f.split("/")[1]) for f in glob.glob("reads/*R3*"): os.rename(f, f.replace("R3", "R2")) ~~~ qiime2 is installed as a conda environment, so that we can maintain qiime specific versions of the programs that it uses. So we need to activate that environment each time we start working in a new session. ~~~bash source activate qiime2-2018.6 # Activate the current qiime2-environment/sandbox source tab-qiime # Load tab completion for qiime2 ~~~ Now we can do the actual import. The key parts of this command is the type and source format. These tell qiime2 what the data is, and therefor how it should use it. ~~~bash qiime tools import\ --type 'SampleData[PairedEndSequencesWithQuality]'\ --input-path reads\ --source-format CasavaOneEightSingleLanePerSampleDirFmt\ --output-path demux.qza qiime demux summarize\ --i-data demux.qza\ --o-visualization demux.qzv ~~~ [demux.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Fdemux.qzv) The visualization demux.qzv gives us a similar quality plot to what you would see from fastqc. Based on the demux.qzv I chose to truncate the forward reads at 182 bp, and the reverse reads at 133 bp. ~~~bash qiime dada2 denoise-paired\ --i-demultiplexed-seqs demux.qza\ --p-trim-left-f 0 --p-trim-left-r 0\ --p-trunc-len-f 182 --p-trunc-len-r 133\ --p-n-threads 72\ --o-representative-sequences rep-seqs --o-table table --o-denoising-stats dns qiime feature-table summarize\ --i-table table.qza\ --m-sample-metadata-file mdat.tsv\ --o-visualization table qiime feature-table tabulate-seqs\ --i-data rep-seqs.qza\ --o-visualization rep-seqs qiime metadata tabulate\ --m-input-file dns.qza\ --o-visualization dns.qzv ~~~ [table.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Ftable.qzv) [rep-seqs.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Frep-seqs.qzv) [dns.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Fdns.qzv) Now that we have the representative sequences, we can assign taxonomy to them. I am going to base the taxonomy off of a blast against the SILVA 132 database. (In this case the practical reason is that the command is exactly the same for 16S and 18S). ~~~bash qiime feature-classifier classify-consensus-blast\ --i-query rep-seqs.qza\ --i-reference-taxonomy /data/share/databases/SILVA_databases/SILVA_132_QIIME_release/taxonomy/taxonomy_all/99/majority_taxonomy_all_levels.qza\ --i-reference-reads /data/share/databases/SILVA_databases/SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva132_99.qza\ --o-classification taxonomy\ --p-perc-identity 0.8\ --p-maxaccepts 1 qiime metadata tabulate\ --m-input-file taxonomy.qza\ --o-visualization taxonomy.qzv qiime taxa barplot\ --i-table table.qza\ --i-taxonomy taxonomy.qza\ --m-metadata-file mdat.tsv\ --o-visualization taxa-barplots.qzv qiime feature-table heatmap\ --i-table table.qza\ --m-metadata-file mdat.tsv\ --o-visualization heatmap.qzv\ --m-metadata-column Type ~~~ [taxonomy.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Ftaxonomy.qzv) [taxa-barplots.qzv](https://view.qiime2.org/?src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Ftaxa-barplot.qzv) Let's pull out just certain groups, diatoms, what else..? ~~~bash mkdir diatom qiime taxa filter-table\ --i-table table.qza\ --i-taxonomy taxonomy.qza\ --p-include Diatomea\ --o-filtered-table diatom/table qiime taxa barplot\ --i-table diatom/table.qza\ --i-taxonomy taxonomy.qza\ --m-metadata-file mdat.tsv\ --o-visualization diatom/taxa-barplots.qzv qiime feature-table filter-seqs\ --i-table diatom/table.qza\ --i-data rep-seqs.qza\ --o-filtered-data diatom/rep-seqs.qza qiime phylogeny align-to-tree-mafft-fasttree\ --i-sequences diatom/rep-seqs.qza\ --o-alignment diatom/aligned-rep-seqs.qza\ --o-masked-alignment diatom/masked-aligned-rep-seqs.qza\ --o-tree diatom/unrooted-tree.qza\ --o-rooted-tree diatom/rooted-tree.qza ~~~ ##### Bacillariophyceae [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FBacillariophyceae%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534191761540068523) ##### Bryozoa [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FBryozoa%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534199241540068797) ##### Chordata [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FChordata%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534203741540068984) ##### Cnidaria [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FCnidaria%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534206941540069111) ##### Gastrotricha [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FGastrotricha%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534213611540069280) ##### Kinorhyncha [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FKinorhyncha%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534231481540069736) ##### Nematoda [taxa-barplots.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2FNematoda%2Ftaxa-barplots.qzv) [Tree](https://itol.embl.de/tree/161010534251761540069876) Let's make heatmaps for all the different taxonomic levels. We have 15 levels of taxonomy from silva, so it would be nice to not have to type 30 lines of qiime commands! #### levels.sh ~~~bash #! /bin/bash mkdir heatmaps for ((i = 1; i < 16; i++)); do qiime taxa collapse\ --i-table table.qza\ --i-taxonomy taxonomy.qza\ --p-level $i\ --o-collapsed-table heatmaps/level-$i-taxa-table qiime feature-table heatmap\ --i-table heatmaps/level-$i-taxa-table.qza\ --m-metadata-file mdat.tsv\ --o-visualization heatmaps/level-$i-heatmap.qzv\ --m-metadata-column Type\ --p-cluster features & done ~~~ Notice the ampersand after the second command, we can put it in the background and start the next taxa-collapse while the previous visualization is still computing! Lets make this script executable, then run it! ~~~bash chmod +x levels.sh ./levels.sh ~~~ Create the phylogenetic tree ~~~bash qiime phylogeny align-to-tree-mafft-fasttree\ --i-sequences rep-seqs.qza\ --o-alignment aligned-rep-seqs.qza\ --o-masked-alignment masked-aligned-rep-seqs.qza\ --o-tree unrooted-tree.qza\ --o-rooted-tree rooted-tree.qza ~~~ Choose a rarefaction depth to normalize our samples to. [Use table.qzv] ~~~bash qiime diversity alpha-rarefaction\ --i-table table.qza\ --p-max-depth 100000\ --p-min-depth 1000\ --p-steps 30\ --o-visualization alpha-rarefaction qiime diversity core-metrics-phylogenetic\ --i-phylogeny rooted-tree.qza\ --i-table table.qza\ --p-sampling-depth 22589\ --m-metadata-file mdat.tsv\ --output-dir core-metrics-results ~~~ [alpha-rarefaction.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Falpha-rarefaction.qzv) Pick and visualize our favorite metrics... ~~~bash qiime diversity alpha-group-significance\ --i-alpha-diversity core-metrics-results/faith_pd_vector.qza\ --m-metadata-file mdat.tsv\ --o-visualization core-metrics-results/faith-pd-group-significance.qzv qiime diversity beta-group-significance\ --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza\ --m-metadata-file mdat.tsv\ --m-metadata-column Type\ --o-visualization core-metrics-results/unweighted-unifrac-Type-significance.qzv\ --p-pairwise ~~~ [faith-pd-group-significance.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Ffaith-pd-group-significance.qzv) [unweighted-unifrac-Type-significance.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Funweighted-unifrac-Type-significance.qzv) [unweighted_unifrac_emperor.qzv](https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fbitbucket.org%2Fdwthomas%2Fworkshop-notes%2Fraw%2Fmaster%2Fgiga3%2Fqiime2%2Funweighted_unifrac_emperor.qzv)