Σgraphr
Upload a sequence or sequences in a single FAS/FASTA fomat file, and ∑graphr produces a cumulative (left to right) summation for each character in each of the sequences.
Want to run it locally or view the code? See the GitHub repo.
About Σgraphr
Σgraphr uses the same principles as the elegant SNAP (Synonymous Nonsynonymous Analysis Program) of Korber B. (2000). HIV Signature and Sequence Variation Analysis. Computational Analysis of HIV Molecular Sequences, Chapter 4, pages 55-72. Allen G. Rodrigo and Gerald H. Learn, eds. Dordrecht, Netherlands: Kluwer Academic Publishers.)
It calculates a running sum (i.e. cumulative total, ∑) for each character starting from the left-hand end (5'-terminus or N-terminus) and outputs it as a column in a CSV text format file (one file for each input FAS sequence), so that running sums can be graphed using Excel and compared. If a character is distributed evenly throughout the original sequence then its line on the graph will increase uniformly. However, if it is distributed unevenly throughout the sequence then its line on the graph will increase irregularly.
Examples of Σgraphr plots are shown below. The first pair show the Σgraphr plot of the concatenated main ORFs of the genome of the type strain of tobacco mosaic virus (TMV; NC_001367), and that of its encoded amino acids. It is clear that the four nucleotides are evenly spread throughout the concatenated ORFs, with no clear indication that the encoded proteins are affecting the slope of the graph. However, the Σgraphr plot of the encoded amino acids shows clear differences of composition between the replicase gene, the movement protein and the coat protein. We interpret this as probably visually revealing an underlying bias in the functional chemistry of the TMV replicase that affects all parts of the genome evenly, whereas selection permitted by the genetic code allows the different individual encoded proteins to be differentially selected, and this requires an uneven distribution of the amino acids.
The second pair of graphs are from alfalfa mosaic virus, which has a genome split into three parts: a replicase gene (RNA1; NC_001495), a movement protein (RNA2; NC_002024) and coat protein (RNA3; NC_002025). When the ORFs from these are appended to form a single sequence and Σgraphr plotted, it can be seen that the concatenated genome segments again have nucleotides evenly distributed, whereas the encoded amino acids are not, and we interpret this as it was for TMV.