To implement the informatics method, SCUO, we created a codon table for the amino acids that have more than one codon, indexed in an arbitrary way, so that we may unambiguously refer to the jth (degenerate) codon of amino acid i, 1 ≤ i ≤ 18. In mycoplasmas, Trp was also included into the codon table since a standard stop codon TGA encodes Trp in this specific species so that 1 ≤ i ≤ 19. To simplify the explanation, the following description of the method is only based on the standard genetic codon table although the actual SCUO computation considered special cases for different organisms. Let
represent the number of degenerate codons for amino acid i, so 1 ≤ j ≤
; for example, 1 ≤ j ≤ 6 for leucine, 1 ≤ j ≤ 2 for tyrosine, etc. For each sequence, let
represent the occurrence of synonymous codon j for amino acid i, 1 ≤ i ≤ 18, 1 ≤ j ≤
. Normalizing the
by their sum over j gives the frequency of the jth degenerate codon for amino acid i in each sequence
According to information theory, we define the entropy
of the ith amino acid of the jth codon in each sequence by
Summing over the codons representing amino acid i gives the entropy of the ith amino acid in the each sequence
If the synonymous codons for the ith amino acid were used at random, one would expect a uniform distribution of them as representatives for the ith amino acid. Thus, the maximum entropy for the ith amino acid in each sequence is
If only one of the synonymous codons is used for the ith amino acid, i.e., the usage of the synonymous codons is biased to the extreme, then the ith amino acid in each sequence has the minimum entropy:
![]()
This information measures the non-randomness in synonymous codon usage and therefore describes the degree of organization for synonymous codon usage for the ith amino acid in each sequence.
Let
be the normalized difference between the maximum entropy and the observed entropy for the ith amino acid in each sequence, i.e.
![]()
Obviously, 0 ≤
≤ 1. When synonymous codon usage for the ith amino acid is random,
= 0. When this usage is biased to the extreme,
= 1. Thus,
can be thought as a measure of the bias in synonymous codon usage for the ith amino acid in each sequence. We designate the statistics
as the synonymous codon usage order (SCUO) for the ith amino acid in each sequence.
Let
be the composition ratio of the ith amino acid in each sequence:

Then the average SCUO for each sequence can be represented as
![]()
The SCUO represents the overall synonymous codon usage order for the sequence.
About CodonO
About Codon Usage Bias