Seqool is a free (for educational use) sequence analysis software designed primarily for searching biological signals in nucleic acid sequences. The sequence analysis program package provides several pattern recognition models, but it also includes the most common sequence analysis statistics, such as GC content, codon usage, etc.

The Seqool sequence analysis software offers several pattern recognition methods for searching for biological signals, such as splice sites or user specific signals. Pattern recognition models include weight matrices (profiles), position specific score matrices, binding energy based signal search models (e.g. for snRNAs), maximum dependence decomposition models, profile hidden Markov models, and models scorring the composition of a sequence. Models can be combined using e.g. decision trees or neural networks in order to construct more refined models or for combining information from several models or sequence statistcs for a final classification of a signal.

The Seqool sequence analysis tool also includes text search algorithms for the identification of simple signals. The support of IUPAC nucleic acid codes (such as y for pyrimidines) allows less strict text searches. Over-represented oligonucleotides ("words") can be identified and optionally clustered to groups of similar words. Additional features of the sequence analysis software include the calculation of sequence composition statistics (GC, codon usage, nucleotide and oligonucleotide frequencies) and a manipulation and extraction tool for sequence or text-files.


Basic sequence analysis:

    • Nucleotide composition, oligo-nucleotide composition
    • GC content, codon usage, codon preference
    • Calculation of over- or under-represented oligo-nucleotides
    • Calculation within windows of a given size or for whole sequences, for single sequences or several sequences together
    • Extraction of sequences with a given composition (e.g. sequences with a GC content lower than 0.41)

Signal search:

    • Exact text search, text search using IUPAC codes (e.g. "y" for pyrimidines), search of repeats, stop and start codons, restriction sites
    • Profiles (weight matrices/position specific score matrices)
    • Profile hidden Markov models
    • Maximum dependence decomposition
    • Oligo-nucleotide frequency models / models for sequence composition (e.g. GC, codon usage, codon preference, frequencies of nucleotides or oligo-nucleotides)
    • Search for RNA binding motifs (based on binding energy, e.g. snRNPs)

Combination of models:

    • Decision trees
    • Neural networks (Backpropagation networks)
    • Model combinations by addition or subtraction of scores (Hybrid models)
    • Models scoring the distance between signals

File format support:

    • Support of the most common sequence file formats, such as FastA, GenBank, GCG, EMBL, and plain sequences (raw).
    • A comprehensive sequence and text formatation and extraction tool (FastAFormat) which allows the extraction of sequences from virtually any file format.

