Synonymous Codon Usage Order (SCUO)



To implement the informatics method, SCUO, we created a codon table for the amino acids that have more than one codon, indexed in an arbitrary way, so that we may unambiguously refer to the jth (degenerate) codon of amino acid i, 1 ≤ i ≤ 18. In mycoplasmas, Trp was also included into the codon table since a standard stop codon TGA encodes Trp in this specific species so that 1 ≤ i ≤ 19. To simplify the explanation, the following description of the method is only based on the standard genetic codon table although the actual SCUO computation considered special cases for different organisms. Let nj represent the number of degenerate codons for amino acid i, so 1 ≤ jnj; for example, 1 ≤ j ≤ 6 for leucine, 1 ≤ j ≤ 2 for tyrosine, etc. For each sequence, let represent the occurrence of synonymous codon j for amino acid i, 1 ≤ i ≤ 18, 1 ≤ jnj. Normalizing the xij by their sum over j gives the frequency of the jth degenerate codon for amino acid i in each sequence


   
According to information theory, we define the entropy Hij of the ith amino acid of the jth codon in each sequence by

 

Summing over the codons representing amino acid i gives the entropy of the ith amino acid in the each sequence


  

If the synonymous codons for the ith amino acid were used at random, one would expect a uniform distribution of them as representatives for the ith amino acid. Thus, the maximum entropy for the ith amino acid in each sequence is


  


If only one of the synonymous codons is used for the ith amino acid, i.e., the usage of the synonymous codons is biased to the extreme, then the ith amino acid in each sequence has the minimum entropy:




This information measures the non-randomness in synonymous codon usage and therefore describes the degree of organization for synonymous codon usage for the ith amino acid in each sequence.




Let SCUOi be the normalized difference between the maximum entropy and the observed entropy for the ith amino acid in each sequence, i.e.




Obviously, 0 ≤ SCUOi ≤ 1. When synonymous codon usage for the ith amino acid is random, SCUOi = 0. When this usage is biased to the extreme, SCUOi = 1. Thus, SCUOi can be thought as a measure of the bias in synonymous codon usage for the ith amino acid in each sequence. We designate the statistics SCUOi as the synonymous codon usage order (SCUO) for the ith amino acid in each sequence.

Let Fi be the composition ratio of the ith amino acid in each sequence:




Then the average SCUO for each sequence can be represented as




The SCUO represents the overall synonymous codon usage order for the sequence.

 

 

About CodonO

About Codon Usage Bias