API¶
Sequence Composition Metrics¶
-
seqm.
polydict
(seq, nuc='ACGT')¶ Computes largest homopolymer for all specified nucleotides.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.polydict('AAAACCGT') {'A': 4, 'C': 2, 'G': 1, 'T': 1}
-
seqm.
polylength
(seq)¶ Calculate length of maximum homopolymer stretch within sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.polylength('AAAACCGT') 4
-
seqm.
entropy
(seq)¶ Calculate Shannon entropy of sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.entropy('AGGATAAG') 1.40 >>> sequtils.entropy('AAAACCGT') 1.75
-
seqm.
gc_percent
(seq)¶ Calculate fraction of GC bases within sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.gc_percent('AGGATAAG') 0.375
-
seqm.
gc_skew
(seq)¶ Calculate GC skew (g-c)/(g+c) for sequence. For homopolymer stretches with no GC, the skew will be rounded to zero.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.gc_skew('AGGATAAG') 3.0
-
seqm.
gc_shift
(seq)¶ Calculate GC shift (a + t)/(g + c) for sequence. For homopolymer stretches with no GC, the shift will be rounded to the number of bases in the sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.gc_shift('AGGATAAG') 1.67
-
seqm.
dna_weight
(seq)¶ Return molecular weight of triphosphate dna sequence (g/mol).
See https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html for details on conversions.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.dna_weight('AGGATAAG') 3968.59
-
seqm.
rna_weight
(seq)¶ Return molecular weight of triphosphate rna sequence (g/mol).
See https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html for details on conversions.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.rna_weight('AGGATAAG') 4082.59
-
seqm.
aa_weight
(seq)¶ Return molecular weight of amino acid sequence (g/mol).
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.aa_weight('AGGATAAG') 700.8
-
seqm.
zipsize
(seq)¶ Calculate size of gzip-compressed sequence.
- Parameters
seq (str) – Sequence
Examples
>>> sequtils.zipsize('AGGATAAGAGATAGATTT') 22
-
seqm.
tm
(seq, mv=50, dv=1.5, n=0.6, d=50, tp=1, sc=1)¶ Calculate size of gzip-compressed sequence.
- Parameters
seq (str) – Sequence
mv (float) – Concentration of monovalent cations in mM, by default 50mM
dv (float) – Concentration of divalent cations in mM, by default 1.5mM
n (float) – Concentration of deoxynycleotide triphosphate in mM, by default 0.6mM
d (float) – Concentration of DNA strands in nM, by default 50nM
tp (int) –
Specifies the table of thermodynamic parameters and the method of melting temperature calculation (default 1):
- 0 Breslauer et al., 1986 and Rychlik et al., 1990
(used by primer3 up to and including release 1.1.0).
1 Use nearest neighbor parameters from SantaLucia 1998
sc (int) –
Specifies salt correction formula for the melting temperature calculation (default 1):
- 0 Schildkraut and Lifson 1965, used by primer3 up to
and including release 1.1.0.
1 SantaLucia 1998 2 Owczarzy et al., 2004
Examples
>>> sequtils.tm('AGGATAAGAGATAGATTT') 39.31
Domain Conversion¶
-
seqm.
revcomplement
(seq)¶ Reverse complement sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.revcomplement('AACCTT') 'AAGGTT'
-
seqm.
complement
(seq)¶ Complement sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.complement('AACCTT') TTGGAA
-
seqm.
aa
(seq)¶ Return amino acid translation of sequence. Ends of the sequences that don’t produce a full codon will be clipped.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.aa('ATGTAG') M*
-
seqm.
likelihood
(seq)¶ Translates quality scores sequence into error likelihoods.
- Parameters
seq (str) – Sequence of quality scores.
-
seqm.
qscore
(seq)¶ Translates quality score sequence into phred-scaled likelihoods.
- Parameters
seq (str) – Sequence of quality scores.
Sequence Similarity Metrics¶
-
seqm.
hamming
(seq1, seq2)¶ Calculate hamming distance between sequences.
- Parameters
seq1 (str) – Reference sequence
seq2 (str) – Sequence to compare
Examples
>>> hamming('AACCTT', 'AAGCCTT') 1
-
seqm.
edit
(seq1, seq2)¶ Wrapper around editdistance.eval for fast Levenshtein distance computation.
- Parameters
seq1 (str) – Reference sequence
seq2 (str) – Sequence to compare
Examples
>>> edit('banana', 'bahama') 2
Objects¶
-
class
seqm.
Sequence
(sequence)¶ Object for managing sequence structure and operating on sequences (i.e. getting amino acid sequence, reverse complement, gc content, etc …).
- Parameters
sequence (str) – Nucleotide sequence.
Examples
>>> seq = sequtils.Sequence('ACGTACGT') >>> seq.gc_content 0.25 >>> seq.revcomplement ACGTACGT >>> seq.dna_weight 3895.59
-
aa
¶ Wrapper around
sequtils.aa()
for thesequtils.Sequence
object.
-
aa_weight
¶ Wrapper around
sequtils.aa_weight()
for thesequtils.Sequence
object.
-
complement
¶ Wrapper around
sequtils.complement()
for thesequtils.Sequence
object.
-
dna_weight
¶ Wrapper around
sequtils.dna_weight()
for thesequtils.Sequence
object.
-
edit
(other)¶ Wrapper around
sequtils.edit()
for thesequtils.Sequence
object.- Parameters
other (str, Sequence) – Sequence to compare.
-
entropy
¶ Wrapper around
sequtils.entropy()
for thesequtils.Sequence
object.
-
gc_percent
¶ Wrapper around
sequtils.gc_percent()
for thesequtils.Sequence
object.
-
gc_shift
¶ Wrapper around
sequtils.gc_shift()
for thesequtils.Sequence
object.
-
gc_skew
¶ Wrapper around
sequtils.gc_skew()
for thesequtils.Sequence
object.
-
hamming
(other)¶ Wrapper around
sequtils.hamming()
for thesequtils.Sequence
object.- Parameters
other (str, Sequence) – Sequence to compare.
-
polydict
¶ Wrapper around
sequtils.polydict()
for thesequtils.Sequence
object.
-
polylength
¶ Wrapper around
sequtils.polylength()
for thesequtils.Sequence
object.
-
revcomplement
¶ Wrapper around
sequtils.revcomplement()
for thesequtils.Sequence
object.
-
rna_weight
¶ Wrapper around
sequtils.rna_weight()
for thesequtils.Sequence
object.
-
tm
¶ Wrapper around
sequtils.zipsize()
for thesequtils.Sequence
object.
-
wrap
(bases=60)¶ Wrapper around
sequtils.wrap()
for thesequtils.Sequence
object.- Parameters
bases (int) – Number of bases to include in line.
-
zipsize
¶ Wrapper around
sequtils.zipsize()
for thesequtils.Sequence
object.