Jump to Section:
CZ GEN EPI offers access to a tool that can place sequences onto existing, pre-calculated trees rather than building trees from scratch with your data. Below are steps to place your samples onto a global SARS-CoV-2 phylogeny using the Ultrafast Sample placement on Existing tRees (UShER) tool.
After reading this user guide, you will be able to:
- Run UShER to place your samples onto a SARS-CoV-2 phylogenetic tree.
- Become familiar with the UShER results table.
- Visualize your samples on an interactive phylogenetic subtree using Nextstrain.
Running an UShER phylogenetic placement
You can quickly evaluate how your samples compare to samples present in a global SARS-CoV-2 phylogenetic tree using UShER. To place your samples within a predetermined and updated SARS-CoV-2 phylogenetic tree:
- Select the samples you wish to place on the phylogenetic tree and click on ‘Run UShER Phylogenetic Placement’ from the ‘Run Phylogenetic Analysis’ dropdown menu on the right-hand side of your Sample Page.
Select samples for phylogenetic placement and click on the tree icon representing the ‘Run Phylogenetic Analysis’ menu on the right-hand side of the Sample Page.
Select ‘UShER Phylogenetic Placement’ from the dropdown menu.
- A dialog box will appear where you can specify your analysis settings.
Dialog box for selecting settings for UShER phylogenetic placements.
You can specify the following:
Phylogenetic tree version: You can choose from two phylogenetic tree versions containing high quality sequences from three or four databases. Both versions are updated daily by UShER. For SARS-CoV-2 sequence placements, it is recommended that you use the tree version with four databases which includes sequences from GISAID (the largest repository of SARS-CoV-2 sequences).
Number of samples per subtree showing sample placement: You can select the number of samples that will be included in the subtree where your samples are placed. We recommend that you include at least 50 samples in the subtree when running UShER for less than 10 sequences. When running over 10 sequences, we recommend to include 5 times the original number of selected samples (for example, if you are running UShER for 20 samples, you should include at least 100 samples in the subtree).
- After selecting your settings, click on ‘Create Placement’. A warning message will appear to confirm if you would like to continue with the analysis outside of the platform given that UShER is a separate service that is not run within CZ GEN EPI. If you agree to continue, the analysis will run from the University of California Santa Cruz (UCSC) UShER visualization service.
Once you select your settings, click on ‘Create Placement’. If you agree with the warning message that follows, click ‘Continue’ to initiate the UShER phylogenetic placement of your samples.
- After agreeing to continue with the analysis, a new tab will open where UShER results will appear. The new page will seem blank or unresponsive while the placement analysis is running. Be patient, the placement analysis might take a few minutes.
- When the phylogenetic placement is done, an output table will appear outlining the results of the analysis.
UShER output table showing phylogenetic placement results for two sequences. In this example, both sequences represent the same Nextstrain clade. Sequence ‘Ex2’ is more divergent than ‘Ex1’ when compared to sequences on the pre-calculated tree.
The following information can be quickly glanced from the table:
Nextstrain clade: You will see the Nextstrain clade assignment of your samples.
Neighboring sample in tree: You will see the name and accession number of a sample that is already present in the pre-calculated tree that is closely related to your sample.
Lineage of neighbor: You will see if the closest neighbor to your sample falls within the same lineage.
Maximally parsimonious placements: You will see the number of potentially good placements for your sample on the tree. This value represents an uncertainty measurement. If a sample can be reasonably placed in many places in the tree, we are less confident in which placement is the correct one. The lower the number (ideally 1), the more confident the placement will be.
Parsimony score: Refers to the number of changes that must be added to the tree when placing your sample. The higher the parsimony score, the more diverged or different your uploaded sample is compared to samples on the predetermined tree.
Visualizing UShER sample placement on a phylogenetic tree
You can visualize the calculated UShER subtree(s) with your samples in Nextstrain. Note that the analysis will result in more than one subtree when your uploaded samples are not closely related to each other and fall within different parts or clades of the global SARS-CoV-2 phylogenetic tree. To visualize the calculated UShER subtrees:
- Go to the last column of the output table (‘Subtree number’) and click on the link ‘view in Nextstrain’ for a given sample. Alternatively, scroll down the UShER results page and click on ‘view subtree #’, where # represents the number for a given sample (see ‘Subtree number’ column to find the number for each sample).
UShER results page highlighting options to visualize subtrees in Nextstrain.
- When you click ‘view subtree in Nextstrain’ you will be directed to a new page where you will see the newly created UShER subtree in Nextstrain.
Example of default Nextstrain tree including an uploaded sample highlighting the sample legend, the placement of the uploaded sample, and the ‘Mutations’ axis.
- The default tree color scheme based on Nextstrain clades might make it difficult to distinguish your sample(s). You can adjust the color scheme using the ‘Color by’ dropdown menu on the left-hand side of the screen.
Since default colors might be hard to distinguish, you can edit the color scheme by clicking on the ‘Color By’ dropdown menu.
Example of Nextstrain tree color scheme based on SARS-CoV-2 lineage.
Example of Nextstrain tree color scheme based on sample type.