Commands¶
compare_genomes.py¶
Compares genomes using an aligned fasta file and migrates annotations from a reference to the other sequences in the alignment
Usage:
compare_genomes.py [-r REFERENCE] [--align_format FORMAT] [-o PREFIX]
[--gff_feature_types GFF_FEATURE_TYPES]
[--gff_attributes GFF_ATTRIBUTES] [-v] [--version]
[-h]
aligned_fasta gene_gff
Required Arguments:
aligned_fasta An aligned fasta file
gene_gff An gff file with features to be migrated
-r REFERENCE, --reference REFERENCE
Sequence id of reference sequence in aligned fasta
file
Optional Arguments:
--align_format FORMAT
Alignment format (default: fasta)
-o PREFIX, --output PREFIX
Output prefix (default: compare_genomes_output/)
--gff_feature_types GFF_FEATURE_TYPES
Comma separated list of gff feature types toparse
(default: CDS,exon,gene,mRNA,stem_loop)
--gff_attributes GFF_ATTRIBUTES
Comma separated list of feature attributes tocarry
over (default: ID,Parent,Note,gene,function,product)
-v, --verbose verbose output
--version show program's version number and exit
-h, --help show this help message and exit
fastq_to_fasta.py¶
Convert a FASTQ file to a FASTA file
Usage:
fastq_to_fasta.py [-h] [-w WRAP] [-v] [--version] fastq_file fasta_file
Required Arguments:
fastq_file
fasta_file
Optional Arguments:
-h, --help show this help message and exit
-w WRAP, --wrap WRAP Maximum length of lines, 0 means do not wrap (default:
0)
-v, --verbose verbose output
--version show program's version number and exit
find_contig_deletions.py¶
Find contigs with deletions from the contig composition file output from compare_genomes.py
Usage:
find_contig_deletions.py [-h] [-o OUTPUT_DIR] [-q] [-v] [--version]
contig_composition aligned_fasta contigs_fasta
Find contigs with deletions from the contig composition file output from compare_genomes.py
Required Arguments:
contig_composition Contig composition file output from compare_genomes.py
aligned_fasta Aligned FASTA file
contigs_fasta Contigs FASTA file
Optional Arguments:
-h, --help show this help message and exit
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to store output files, default is
aligned_fasta directory
-q, --quiet Quiet, replace all deletions found, no prompts
-v, --verbose verbose output
--version show program's version number and exit
gff2gtf_simple.py¶
Simple conversion of GFF files to GTF files.
Usage:
gff2gtf_simple.py [-h] [-v] [--version] gff_file
Required Arguments:
gff_file GFF file to convert
Optional Arguments:
-h, --help show this help message and exit
-v, --verbose verbose output
--version show program's version number and exit
maf_net.py¶
Output an aligned fasta file by stitching together a specified reference sequence in the MAF file and using the highest scoring block for each section.
Usage:
maf_net.py [-r REFERENCE] [-c CHROMOSOME] [-s SPECIES] [-o OUTPUT_DIR]
[--consensus_sequence] [--reference_fasta REFERENCE_FASTA]
[-v] [--version] [-h]
maf_file
Required Arguments:
maf_file MAF file to stitch together
-r REFERENCE, --reference REFERENCE
Reference species (e.g. scerevisiae)
-c CHROMOSOME, --chromosome CHROMOSOME
Sequence ID of the chromosome for which to generate
the alignment net (e.g. chrI)
-s SPECIES, --species SPECIES
List of species to include, comma separated (e.g.
scerevisiae,sbayanus)
Optional Arguments:
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to store output file, default is maf file
directory
--consensus_sequence Output "consensus sequence" for each species in files
named [species].[chromosome].consensus.fasta
--reference_fasta REFERENCE_FASTA
Check MAF file against this fasta (for
troubleshooting, debugging)
-v, --verbose verbose output
--version show program's version number and exit
-h, --help show this help message and exit
Output:
Aligned Fasta File: BASENAME.net.afa
This file contains an aligned fasta file created by stitching together MAF blocks based on the reference sequence. Where two blocks overlap, the higher scoring block is used.
Optional Output (one per species):
Consensus Sequence: SPECIES.consensus.fasta
A FASTA file containing the consensus sequence for this species. N’s in the sequence represent sections where no contigs mapped to a section of the reference (i.e. potential gaps in the scaffold).
Consensus Contig Composition GFF: SPECIES.consensus_contig_composition.gff
GFF formatted file describes intervals in the SPECIES genome. The attributes contain information about the contigs used to determine the sequence in this interval. The attributes are:
- src_seq
- src_seq_start
- src_seq_end
- src_strand
- src_size
- maf_block
- block_start
- block_end
- ref_src
- ref_start
- ref_end
- ref_strand
Consensus Contig Composition Summary: SPECIES.consensus_contig_composition_summary.txt
Tab delimited file with the following columns that describes intervals in the SPECIES genome and the contigs that were used for the sequence.
- seq - sequence id of the interval in the SPECIES genome
- start - start position of the interval
- end - end position of the interval
- contig - contig id that was used to “build” this interval. If None, that means no contig was found for the analogous region in the reference.
- contig_start - the start position of the contig that aligned to this start interval
- contig_end - the end position of the contig that aligned to the end position of this interval
- contig_strand - the direction that the contig aligned to the reference (if ‘-‘, the reverse complement of the contig aligned to the reference in this interval)
- contig_size - the full size of the contig (including those bases that did not aligned to this interval)
makePairedOutput2EQUALfiles_vamp.pl¶
Modified versions of scripts provided by SSAKE. They are used to prepare two separate paired end fastq files for use by SSAKE. The modifications made were to accommodate new Illumina style sequence identifiers introduced with CASAVA 1.8.:
Usage: makePairedOutput2EQUALfiles_vamp.pl <fasta file 1> <fasta file 2> <library insert size>
--- ** Both files must have the same number of records & arranged in the same order
makePairedOutput2UNEQUALfiles_vamp.pl¶
See makePairedOutput2EQUALfiles_vamp.pl:
Usage: makePairedOutput2UNEQUALfiles_vamp.pl <fasta file 1> <fasta file 2> <library insert size>
--- files could have different # of records & arranged in different order but template ids must match
TQSfastq_vamp.py¶
Preforms quality trimming as per the original SSAKE script. It was modified to accommodate larger, zipped fastq files.
Usage:
TQSfastq_vamp.py [options]
Optional Arguments:
-h, --help show this help message and exit
-f FASTQFILE, --fastq file=FASTQFILE
Sanger encoded fastq file - PHRED quality scores,
ASCII+33
-t THRESHOLD, --Phred quality threshold=THRESHOLD
Base intensity threshold value (Phred quality scores 0
to 40, default=10)
-c CONSEC, --consec=CONSEC
Minimum number of consecutive bases passing threshold
values (default=20)
-v, --verbose Runs in Verbose mode.
-q, --qualities Outputs Qualities to FASTQ file (default is FASTA)
-z, --zip Compress output with gzip
-o OUTPUT_BASE, --output=OUTPUT_BASE
Output filename base
translate_cds.py¶
Extracts the coding sequences (CDS) regions from a fasta reference and gff file and translates them into amino acid sequences, output in FASTA format to STDOUT
Usage:
translate_cds.py [--notrans] [-i IDATTR] [-t FEATURETYPE]
[--table TABLE] [-v] [--version] [-h]
gff_file fasta_file
Required Arguments:
gff_file GFF file containing CDS records to be translated
fasta_file FASTA file containing the nucleotide sequences
referenced in the GFF file
Optional Arguments:
--notrans Do not translate to amino acid sequence, output DNA
-i IDATTR, --idattr IDATTR
GFF attribute to use as gene ID. Features with the
same ID will be considered parts of the same gene. The
default "gene_id" is suitable for GTF files.
-t FEATURETYPE, --featuretype FEATURETYPE
GFF feature type(s) (3rd column) to be used. Specify
the option multiple times for multiple feature types.
The default is "CDS" for GFF files and "CDS" and
"stop_codon" for GTF files.
--table TABLE NCBI Translation table to use when translating DNA
(see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprint
gc.cgi). Default: 1.
-v, --verbose verbose output
--version show program's version number and exit
-h, --help show this help message and exit