MEGAHIT AssemblyΒΆ

MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. MEGAHIT can optionally utilize a CUDA-enabled GPU to accelerate its SdBG contstruction. See the MEGAHIT home page for more info.

MEGAHIT can be run by the following command. As our AWS instance has 16 cores, we use the option -t 16 to tell MEGAHIT it should use 16 parallel threads. The output will be redirected to file megahit.log:

cd /vol/spool/tutorial-data
megahit -1 read1.fq -2 read2.fq -t 16 -o megahit_out >& megahit.log &

The contig sequences are located in the megahit_out directory in file final.contigs.fa. Again, let’s get some basic statistics on the contigs:

getN50.pl -s 500 -f megahit_out/final.contigs.fa

Note

Most jobs above will be started in the backgroud using the & at the end of each command, which allows you to continue working in the shell.

You can watch your running jobs by typing top (hit q to exit top).

You can look into the log-files by typing e.g. less LOGFILE (hit q to quit) or tail -f LOGFILE (hit ^C to quit).