Contact us for help and possible new features:
Harry Gibbs - harrymanninggibbs@gmail.com
Adrian Gibbs - adrian_j_gibbs@hotmail.com

Σgraphr

Upload a sequence or sequences in a single FAS/FASTA fomat file, and ∑graphr produces a cumulative (left to right) summation for each character in each of the sequences.

Want to run it locally or view the code? See the GitHub repo.

Click to select a .fas file, or drag & drop it here
(After you choose a file, hit Convert & Download.)

If the input contains a single sequence, your browser downloads a CSV. If it contains multiple sequences, you receive a ZIP of CSVs.

How to use Σgraphr

  1. Select mode: Choose Singles to count individual letters, or Triplets to count codons (groups of 3). Both T and U are accepted in triplets mode.
  2. Select your file: The sequences must be in FAS/FASTA format. A valid FAS sequence starts with > followed by the sequence name, then one or more lines of letters (e.g. A, C, G, T or amino-acid single letter codes).
  3. Convert: The file is analysed in memory only and is not stored on the server.
  4. Download: If the input contains a single sequence, your browser saves [sequencename]_(singles|triplets)_sigmagraphr_output.csv. If the input contains multiple sequences, you receive [inputname]_(singles|triplets)_sigmagraphr_output.zip containing one CSV per sequence, each named [sequencename]_(singles|triplets)_sigmagraphr_output.csv.
  5. Display Results: Open each of the output CSV-format files in Excel or Google Sheets. Each column shows the running (cumulative) total for each character/triplet. Left to right, across the sequence. Insert into a line chart to compare columns.

What the tool does

For each sequence in your FAS file, Σgraphr identifies the characters present (e.g. nucleotides or amino acids or others) and calculates a left-to-right running sum (∑) for each character. Evenly distributed characters form smooth, steadily rising lines; uneven distributions appear as lines that rise irregularly.

Triplets mode: the sequence is read in groups of three characters (codons), and the output columns are cumulative counts per triplet (ordered deterministically by A → G → C → U/T).

Troubleshooting

  • “No sequences parsed”: The file may not be in FAS/FASTA format. Ensure the first line starts with > and that sequence lines contain only valid letters.
  • Empty or very short output: Check that the sequence contains valid characters (e.g. A, C, G, T for nucleotides).
  • Large files: Larger inputs may take longer to process; a desktop browser is recommended.

Privacy: your data is processed in memory and discarded immediately after the CSVs are generated.

About Σgraphr

Σgraphr uses the same principles as the elegant SNAP (Synonymous Nonsynonymous Analysis Program) of Korber B. (2000). HIV Signature and Sequence Variation Analysis. Computational Analysis of HIV Molecular Sequences, Chapter 4, pages 55-72. Allen G. Rodrigo and Gerald H. Learn, eds. Dordrecht, Netherlands: Kluwer Academic Publishers.)

It calculates a running sum (i.e. cumulative total, ∑) for each character starting from the left-hand end (5'-terminus or N-terminus) and outputs it as a column in a CSV text format file (one file for each input FAS sequence), so that running sums can be graphed using Excel and compared. If a character is distributed evenly throughout the original sequence then its line on the graph will increase uniformly. However, if it is distributed unevenly throughout the sequence then its line on the graph will increase irregularly.

Examples of Σgraphr plots are shown below. The first pair show the Σgraphr plot of the concatenated main ORFs of the genome of the type strain of tobacco mosaic virus (TMV; NC_001367), and that of its encoded amino acids. It is clear that the four nucleotides are evenly spread throughout the concatenated ORFs, with no clear indication that the encoded proteins are affecting the slope of the graph. However, the Σgraphr plot of the encoded amino acids shows clear differences of composition between the replicase gene, the movement protein and the coat protein. We interpret this as probably visually revealing an underlying bias in the functional chemistry of the TMV replicase that affects all parts of the genome evenly, whereas selection permitted by the genetic code allows the different individual encoded proteins to be differentially selected, and this requires an uneven distribution of the amino acids.

The second pair of graphs are from alfalfa mosaic virus, which has a genome split into three parts: a replicase gene (RNA1; NC_001495), a movement protein (RNA2; NC_002024) and coat protein (RNA3; NC_002025). When the ORFs from these are appended to form a single sequence and Σgraphr plotted, it can be seen that the concatenated genome segments again have nucleotides evenly distributed, whereas the encoded amino acids are not, and we interpret this as it was for TMV.

Example Σgraphr plots Example Σgraphr triplet plot