samtools


Introduction:

SAMTOOLS is a set of tools for high throughput sequencing data.

SAMTOOLS is associated with the SAM (Sequence Alignment/Map) format,
designed to store large nucleotide sequence alignments, and

  • can store information generated by various alignment programs;
  • is simple enough to be easily generated, or converted;
  • allows most operations on alignment to be done in a streaming fashion;
  • allows the file to be indexed by genomic position;

SAMTOOLS is a set of utilities that manipulate alignments in the BAM
format. It imports from and exports to the SAM format, does sorting,
merging and indexing, and allows to retrieve reads in any regions swiftly.

SAMTOOLS is designed to work on a stream. It regards an input file
'-' as the standard input (stdin) and an output file '-' as the
standard output (stdout). Several commands can thus be combined with
Unix pipes. SAMTOOLS always outputs warning and error messages to the
standard error output (stderr).

SAMTOOLS is also able to open a BAM (but not SAM) file on a remote
FTP or HTTP server if the BAM file name starts with 'ftp://' or
'http://'. SAMTOOLS checks the current working directory for the
index file and will download the index upon absence. SAMTOOLS does
not retrieve the entire alignment file unless it is asked to do so.

Web Site:

The SAMTOOLS home page at sourceforge:

http://samtools.sourceforge.net/

Reference:

  • SAMTOOLS documentation page at htslib.org:

    http://www.htslib.org/doc/samtools.html
    ,
  • Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan,
    Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin,
    the 1000 Genome Project Data Processing Subgroup,
    The Sequence alignment/map (SAM) format and SAMtools,
    Bioinformatics,
    Volume 25, pages 2078-2079, 2009.

Usage:

On any ARC cluster, check the installation details
by typing "module spider samtools".

SAMTOOLS requires that the appropriate modules be loaded before it can
be used. One version of the appropriate commands for use on NewRiver is:

module purge
module load gcc/5.2.0
module load samtools/1.2
    

Examples:

The following batch file converts a SAM file to BAM format:

#! /bin/bash
#
#PBS -l walltime=0:05:00
#PBS -l nodes=1:ppn=1
#PBS -W group_list=newriver
#PBS -q open_q
#PBS -j oe
#
cd $PBS_O_WORKDIR
#
module purge
module load gcc/5.2.0
module load samtools/1.2
#
#  Convert a SAM file to BAM format.
#  * -b indicates the output should be in BAM format;
#  * -S indicates the input is in SAM format;
#  * -o names the output file.
#
samtools view -b -S -o sim_reads_aligned.bam sim_reads_aligned.sam

A complete set of files to carry out a similar process are available in
samtools_example.tar