TOPHAT is a program that aligns RNA-Seq reads to a genome in order
to identify exon-exon splice junctions using the short-read aligner
bowtie or bowtie2.
TOPHAT accepts reads in FASTA or FASTQ format. TOPHAT was designed to
work with reads produced by the Illumina Genome Analyzer, although users
have been successful with reads from other technologies. Version 1.1.0
began supporting Applied Biosystems's Colorspace format. The software is
optimized for reads of at least 75 base pairs.
TOPHAT works by mapping RNA-Seq reads to a reference genome, identifying
candidate exons, and building a database of possible splice junctions.
It first identifies regions where the reads map end to end, terming
these as "islands" in the reference genome where reads piled up. Many of
these islands will be exons. TOPHAT then runs an internal program to
find splice junctions in the unmapped reads, and generates an alignment
file that can be used in the cufflinks pipeline.
The TOPHAT web page at Johns Hopkins University:
The TOPHAT documentation page:
C Trapnell, L Pachter, SL Salzberg,
Tophat: discovering splice junctions with RNA-Seq,
Volume 25, Number 9, pages 1105-1111, May 2009.
On any ARC cluster, check the installation details
by typing "module spider tophat".
TOPHAT requires that the appropriate modules be loaded before it can
be used. One version of the appropriate commands for use on NewRiver is:
module purge module load gcc/5.2.0 module load python/2.7.10 module load boost/1.58.0 module load bowtie2/2.2.5 module load tophat/2.1
The following batch file uses TOPHAT to analyze FASTQ data.
#! /bin/bash # #PBS -l walltime=00:05:00 #PBS -l nodes=1:ppn=1 #PBS -W group_list=newriver #PBS -q open_q #PBS -j oe # cd $PBS_O_WORKDIR # module purge module load gcc/5.2.0 module load python/2.7.10 module load boost/1.58.0 module load bowtie2/2.2.5 module load tophat/2.1 # tophat -r 20 test_ref reads_1.fq reads_2.fq
A complete set of files to carry out a similar process are available in