API

Sequence Composition Metrics

seqm.polydict(seq, nuc='ACGT')

Computes largest homopolymer for all specified nucleotides.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.polydict('AAAACCGT')
{'A': 4, 'C': 2, 'G': 1, 'T': 1}
seqm.polylength(seq)

Calculate length of maximum homopolymer stretch within sequence.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.polylength('AAAACCGT')
4
seqm.entropy(seq)

Calculate Shannon entropy of sequence.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.entropy('AGGATAAG')
1.40
>>> sequtils.entropy('AAAACCGT')
1.75
seqm.gc_percent(seq)

Calculate fraction of GC bases within sequence.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.gc_percent('AGGATAAG')
0.375
seqm.gc_skew(seq)

Calculate GC skew (g-c)/(g+c) for sequence. For homopolymer stretches with no GC, the skew will be rounded to zero.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.gc_skew('AGGATAAG')
3.0
seqm.gc_shift(seq)

Calculate GC shift (a + t)/(g + c) for sequence. For homopolymer stretches with no GC, the shift will be rounded to the number of bases in the sequence.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.gc_shift('AGGATAAG')
1.67
seqm.dna_weight(seq)

Return molecular weight of triphosphate dna sequence (g/mol).

See https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html for details on conversions.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.dna_weight('AGGATAAG')
3968.59
seqm.rna_weight(seq)

Return molecular weight of triphosphate rna sequence (g/mol).

See https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html for details on conversions.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.rna_weight('AGGATAAG')
4082.59
seqm.aa_weight(seq)

Return molecular weight of amino acid sequence (g/mol).

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.aa_weight('AGGATAAG')
700.8
seqm.zipsize(seq)

Calculate size of gzip-compressed sequence.

Parameters

seq (str) – Sequence

Examples

>>> sequtils.zipsize('AGGATAAGAGATAGATTT')
22
seqm.tm(seq, mv=50, dv=1.5, n=0.6, d=50, tp=1, sc=1)

Calculate size of gzip-compressed sequence.

Parameters
  • seq (str) – Sequence

  • mv (float) – Concentration of monovalent cations in mM, by default 50mM

  • dv (float) – Concentration of divalent cations in mM, by default 1.5mM

  • n (float) – Concentration of deoxynycleotide triphosphate in mM, by default 0.6mM

  • d (float) – Concentration of DNA strands in nM, by default 50nM

  • tp (int) –

    Specifies the table of thermodynamic parameters and the method of melting temperature calculation (default 1):

    0 Breslauer et al., 1986 and Rychlik et al., 1990

    (used by primer3 up to and including release 1.1.0).

    1 Use nearest neighbor parameters from SantaLucia 1998

  • sc (int) –

    Specifies salt correction formula for the melting temperature calculation (default 1):

    0 Schildkraut and Lifson 1965, used by primer3 up to

    and including release 1.1.0.

    1 SantaLucia 1998 2 Owczarzy et al., 2004

Examples

>>> sequtils.tm('AGGATAAGAGATAGATTT')
39.31

Domain Conversion

seqm.revcomplement(seq)

Reverse complement sequence.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.revcomplement('AACCTT')
'AAGGTT'
seqm.complement(seq)

Complement sequence.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.complement('AACCTT')
TTGGAA
seqm.aa(seq)

Return amino acid translation of sequence. Ends of the sequences that don’t produce a full codon will be clipped.

Parameters

seq (str) – Nucleotide sequence

Examples

>>> sequtils.aa('ATGTAG')
M*
seqm.likelihood(seq)

Translates quality scores sequence into error likelihoods.

Parameters

seq (str) – Sequence of quality scores.

seqm.qscore(seq)

Translates quality score sequence into phred-scaled likelihoods.

Parameters

seq (str) – Sequence of quality scores.

Sequence Similarity Metrics

seqm.hamming(seq1, seq2)

Calculate hamming distance between sequences.

Parameters
  • seq1 (str) – Reference sequence

  • seq2 (str) – Sequence to compare

Examples

>>> hamming('AACCTT', 'AAGCCTT')
1
seqm.edit(seq1, seq2)

Wrapper around editdistance.eval for fast Levenshtein distance computation.

Parameters
  • seq1 (str) – Reference sequence

  • seq2 (str) – Sequence to compare

Examples

>>> edit('banana', 'bahama')
2

Objects

class seqm.Sequence(sequence)

Object for managing sequence structure and operating on sequences (i.e. getting amino acid sequence, reverse complement, gc content, etc …).

Parameters

sequence (str) – Nucleotide sequence.

Examples

>>> seq = sequtils.Sequence('ACGTACGT')
>>> seq.gc_content
0.25
>>> seq.revcomplement
ACGTACGT
>>> seq.dna_weight
3895.59
aa

Wrapper around sequtils.aa() for the sequtils.Sequence object.

aa_weight

Wrapper around sequtils.aa_weight() for the sequtils.Sequence object.

complement

Wrapper around sequtils.complement() for the sequtils.Sequence object.

dna_weight

Wrapper around sequtils.dna_weight() for the sequtils.Sequence object.

edit(other)

Wrapper around sequtils.edit() for the sequtils.Sequence object.

Parameters

other (str, Sequence) – Sequence to compare.

entropy

Wrapper around sequtils.entropy() for the sequtils.Sequence object.

gc_percent

Wrapper around sequtils.gc_percent() for the sequtils.Sequence object.

gc_shift

Wrapper around sequtils.gc_shift() for the sequtils.Sequence object.

gc_skew

Wrapper around sequtils.gc_skew() for the sequtils.Sequence object.

hamming(other)

Wrapper around sequtils.hamming() for the sequtils.Sequence object.

Parameters

other (str, Sequence) – Sequence to compare.

polydict

Wrapper around sequtils.polydict() for the sequtils.Sequence object.

polylength

Wrapper around sequtils.polylength() for the sequtils.Sequence object.

revcomplement

Wrapper around sequtils.revcomplement() for the sequtils.Sequence object.

rna_weight

Wrapper around sequtils.rna_weight() for the sequtils.Sequence object.

tm

Wrapper around sequtils.zipsize() for the sequtils.Sequence object.

wrap(bases=60)

Wrapper around sequtils.wrap() for the sequtils.Sequence object.

Parameters

bases (int) – Number of bases to include in line.

zipsize

Wrapper around sequtils.zipsize() for the sequtils.Sequence object.