Purpose: Detect SNPs from sequencing reactions and a reference sequence.
You need to have these things ready:
| When you first open trace files, their sequence is displayed, one pixel row per file, directly below the reference sequence. Bases that match the reference sequence are displayed in a lighter color. Forward and reverse sequences are shown separately. | ![]() |
![]() |
|
| Below the sequences is a base frequency chart, where a dot for each base is plotted at a height that represents the number of sequences with that base at that position. In an unaligned sequence like this one, it just appears as a cloud. | ![]() |
![]() |
|
| Align | When you click Align, the sequences from the trace files are aligned to the reference sequence. Most of the aligned sequence matches the reference sequence, so it is easy to see possible variations. | ![]() |
![]() |
| Aligning the sequence also cleans up the base frequency chart significantly, because most of the bases at most positions now match the reference (and, therefore, one another). So frequent variations show up as dots that fall sigificantly below the rest. | ![]() |
![]() |
|
| Sort | This button sorts the sequences by how well they match the reference. The best sequences sort to the bottom. This helps you to distinguish between spurious variations that may be the result of poor quality and true variations. True variations may occur in any sequence, but spurious variations are more likely to be seen in sequences near the top. You may want to sort at different points in the process: both before finding indels and after trimming, for instance. | ![]() |
![]() |
| Sorting does not affect the base frequency charts. | ![]() |
![]() |
|
| Trim | Click Trim to discard the bad sequence at the end of each sequencing reaction. This cleans a lot of noise out of the data. | ![]() |
![]() |
| The base frequencies are further cleaned up by trimming, to such an extent that in the best regions (about the first half of the sequence), even bases where only one trace fails to match the reference are worthy of consideration. | ![]() |
![]() |
|
| Filter | Although this sequence doesn't show it very well, Filter cleans up the data further, by eliminating sequences whose length is less than half of distance between the two primers. | ![]() |
![]() |
| When there are poor quality sequences, filtering improves the ability to distinguish good SNPs in the base frequency charts. | ![]() |
![]() |

InSNP identifies possible SNPs by finding positions in the
sequences that differ from the reference sequence. For both forward and
reverse directions, it finds the first stretch of six consecutive bases
where there are no differences from the reference. This is the start of
good sequence. From that point through the position halfway between the primers,
it calls any bases where at least one sequence doesn't match the reference
a possible SNP. Then it takes the best half of the sequences, and counts as
possible SNP any bases with variations in the range from 1/2 to 3/4 of the
inter-primer length.
When you click on a SNP position in the SNP list, it is highlighted in the sequence and base frequency charts, and the traces for nine bases around the position are displayed in the grouped traces pane. The traces in this pane are grouped according to their sequence; the most frequent sequence is displayed at top, followed by the second-most frequent sequence, followed by all the others. These sequences are independently determined for forward and reverse traces.

In the group traces pane, you can quickly see how frequent the different bases are, whether the variation is seen in both forward and reverse reads, and how clean and consistent the traces look. This is often enough information to decide whether there is a true SNP at that position.
If the grouped traces are too messy, you can click on any of the groups to see the traces from that group individually. Here it can be easier to distinguish true heterozygotes from noise. You can control the scale of traces in the individual trace pane with the scale slider.
Further clues about which SNPs are real can be gleaned from the sequence chart; false SNPs tend to appear near the top of the chart, because the poorer quality sequences are at the top.
When you have decided whether a SNP is real, click Accept or Reject.
InSNP uses the file names to determine which sequencing files
are forward and which are reverse. Forward files must contain one of the
following elements: "_F_", "_F1_", "_F2_", "_F3_". All files that don't
contain one of these strings are considered to be reverse.
If you use the Open Forward or Open Reverse buttons the traces will be
treated as all forward or all reverse, regardless of their file names.