Metagenomics Evaluation & Testing Analysis (META) Simulator
Compares open-source metagenomic classification tool performance (precision, sensitivity, runtime) across various sequencing platforms (Illumina MiSeq/iSeq, Oxford Nanopore MinION) and use cases (metagenomic profiles).
Summary
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites
The META system has been designed to run on Linux (specifically, tested on Ubuntu 18.04) and in Docker containers. The following packages are required:
Here is an example of how to install these on Ubuntu 18.04:
# Install Docker engine (reference: https://docs.docker.com/engine/install/ubuntu/)
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world # to verify successful install
Installing
To build the META Simulator, run the following from the root directory of meta_simulator:
docker build -t meta_simulator:latest .
Integrating with Docker-based Meta System
To integrate the META simulator with the Docker-based Meta System, you will need to export the meta_simulator into a docker tarfile and save it in the meta_system/data/docker directory.
- To export the meta_simulator, run the following command:
docker save -o meta_simulator.tar meta_simulator:latest - Move
meta_simulator.tartometa_system/data/docker - To make sure it loads on
meta_systemrunmake load-dockeronmeta_system
Running
The META Simulator requires an abundance profile TSV. An abundance profile is expressed as a tab-delimited text file (TSV) where the first column contains the leaf taxonomic ID, the second column contains the corresponding abundance proportion (must sum to 1.000000), and the third column designates the organism as being foreground (1) or background (0). There should be no headers in the abundance profile TSV. An example is shown below:
400667 0.10 1
435590 0.10 1
367928 0.10 1
864803 0.10 1
1091045 0.10 1
349101 0.10 1
1282 0.10 1
260799 0.10 1
1529886 0.10 1
198094 0.10 1
An example TSV is included within the Docker container in data/strawman_envassay.tsv.
The META Simulator accepts the following arguments:
-tnumber of threads to use for simulations-ilist of taxid with associated abundance (totalling 1.0)-psequencing platform to simulate reads for (case sensitive)- The options are:
iseqIllumina iSeq 100miseqIllumina MiSeq (assuming both illumina platforms have spot count of 8M, and taking 1/100 of this) [80,000]r9Oxford Nanopore R9 flowcell (MIN106) - best performance at 50Gbp output (will assume 20Gbp and 20kb avg read length = 1M reads) [10,000]flgOxford Nanopore Flongle flowcell (FLG001) - best performance at 2Gbp output (1/25 of r9) (assuming 10% of r9 output) [1,000]
- The options are:
-oOutput directory (combined fastq file for classification will be at$outdir/simulated.fastq)
Deep Simulator
To run DeepSimulator (Nanopore R9 flowcell) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p r9 -o data/test
To run DeepSimulator (Nanopore Flongle flowcell) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p flg -o data/test
InsilicoSeq
To run InsilicoSeq (Illumina MiSeq) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p miseq -o data/test
To run InsilicoSeq (Illumina iSeq) using META Simulator, run:
docker run meta_simulator:latest bash scripts/sim_module_wrapper.sh -t 2 -i data/strawman_envassay.tsv -p iseq -o data/test
If you wish to run the simulator with your own abundance profile, use the Docker bind mount -v flag for docker run to mount the volume containing your abundance profile TSV.
License
This project is licensed under Apache 2.0. Copyright under Johns Hopkins University Applied Physics Laboratory.