ABySS

Introduction

ABYSS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

The project web site is http://www.bcgsc.ca/platform/bioinfo/software/abyss

A reference for ABYSS is:

  1. Jared Simpson, Kim Wong, Shaun Jackman, Jacqueline Schein, Steven Jones, Inanc Birol,
    ABySS: A parallel assembler for short read sequence data,
    Genome Research, Volume 19, June 2009, pages 1117-1123.

Availability

On DragonsTooth, HokieSpeed, and NewRiver, the most recent version of ABYSS installed is version 1.5.2. On BlueRidge, the most recent version is version 1.3.5. To check the latest information on available versions of ABYSS on any ARC cluster, type "module spider abyss".

Usage

ABYSS requires that several modules be loaded before it can be run. One version of the appropriate commands is:

      module purge
      module load gcc/5.2.0
      module load openmpi/1.8.5
      module load python
      module load boost
      module load sparsehash
      module load abyss/1.5.2

Examples

Here is an example batch script which uses ABYSS. Two data files in FASTQ format are input.

#! /bin/bash
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4
#PBS -W group_list=newriver
#PBS -q open_q
#PBS -j oe

cd $PBS_O_WORKDIR

module purge
module load gcc/5.2.0
module load openmpi/1.8.5
module load python
module load boost
module load sparsehash
module load abyss/1.5.2
#
#  Assemble a small synthetic dataset
#
abyss-pe k=25 name=test in='reads1.fastq reads2.fastq'
#
#  Calculate assembly contiquity statistics.
#
abyss-fac test-unitigs.fa

A complete set of files to carry out a similar example calculation are available in a tar file.